137 71 111MB
English Pages 823 [819] Year 2021
Lecture Notes on Data Engineering and Communications Technologies 77
Sergii Babichev Volodymyr Lytvynenko Editors
Lecture Notes in Computational Intelligence and Decision Making 2021 International Scientific Conference “Intellectual Systems of Decision-making and Problems of Computational Intelligence”, Proceedings
Lecture Notes on Data Engineering and Communications Technologies Volume 77
Series Editor Fatos Xhafa, Technical University of Catalonia, Barcelona, Spain
The aim of the book series is to present cutting edge engineering approaches to data technologies and communications. It will publish latest advances on the engineering task of building and deploying distributed, scalable and reliable data infrastructures and communication systems. The series will have a prominent applied focus on data technologies and communications with aim to promote the bridging from fundamental research on data science and networking to data engineering and communications that lead to industry products, business knowledge and standardisation. Indexed by SCOPUS, INSPEC, EI Compendex. All books published in the series are submitted for consideration in Web of Science.
More information about this series at http://www.springer.com/series/15362
Sergii Babichev Volodymyr Lytvynenko •
Editors
Lecture Notes in Computational Intelligence and Decision Making 2021 International Scientific Conference “Intellectual Systems of Decision-making and Problems of Computational Intelligence”, Proceedings
123
Editors Sergii Babichev Department of Physics Kherson State University Kherson, Ukraine
Volodymyr Lytvynenko Department of Informatics and Computer Science Kherson National Technical University Kherson, Ukraine
ISSN 2367-4512 ISSN 2367-4520 (electronic) Lecture Notes on Data Engineering and Communications Technologies ISBN 978-3-030-82013-8 ISBN 978-3-030-82014-5 (eBook) https://doi.org/10.1007/978-3-030-82014-5 © Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Collecting, analysis and processing information are the current directions of modern computer science. Many areas of current existence generate a wealth of information which should be stored in a structured manner, analyzed and processed appropriately in order to gain the knowledge concerning investigated process or object. Creating new modern information and computer technologies for data analysis and processing in various fields of data mining and machine learning create the conditions for increasing effectiveness of the information processing by both the decrease of time and the increase of accuracy of the data processing. The international scientific conference “Intellectual Decision-Making Systems and Problems of Computational Intelligence” is a series of conferences performed in East Europe. They are very important for this geographic region since the topics of the conference cover the modern directions in the field of artificial and computational intelligence, data mining, machine learning and decision making. The aim of the conference is the reflection of the most recent developments in the fields of artificial and computational intelligence used for solving problems in a variety of areas of scientific researches related to data mining, machine learning and decision making. The current ISDMCI’2021 conference held in Zalizny Port, Kherson region, Ukraine, from May 24 to 28, 2021, was a continuation of the highly successful ISDMCI conference series started in 2006. For many years, ISDMCI has been attracting hundreds or even thousands of researchers and professionals working in the field of artificial intelligence and decision making. This volume consists of 54 carefully selected papers that are assigned to three thematic sections: Section 1. Analysis and Modeling of Complex Systems and Processes: – – – – – –
Methods and tools of system modeling under uncertainty Problems of identification of complex system models and processes Modeling of operated complex systems Modeling of various nature dynamic objects Time series forecasting and modeling Information technology in education
v
vi
Preface
Section 2. Theoretical and Applied Aspects of Decision-Making Systems: – – – – – –
Decision-making methods Multicriterial models of decision making under uncertainty Expert systems of decision making Methods of artificial intelligence in decision-making systems Software and tools for synthesis of decision-making systems Applied systems of decision-making support Section 3. Computational Intelligence and Inductive Modeling:
– – – – – – – – – –
Inductive methods of complex systems modeling Computational linguistics Data mining Multiagent systems Neural networks and fuzzy systems Evolutionary algorithm and artificial immune systems Bayesian networks Hybrid systems and models Fractals and problems of synergetics Images recognition and cluster analysis
We hope that the broad scope of topics related to the fields of artificial intelligence and decision making covered in this proceedings volume will help the reader to understand that the methods of data mining and machine learning have become an important element of modern computer science. June 2021
Oleh Mashkov Yuri Krak Sergii Babichev Yuriy Bardachov Volodymyr Lytvynenko
Organization
ISDMCI’2021 is organized by the Department of Informatics and Computer Science, Kherson National Technical University, Ukraine, in cooperation with: Black Sea Scientific Research Society, Ukraine Jan Evangelista Purkyně University in Ústí nad Labem, Ústí nad Labem, Czech Republic Lublin University of Technology, Poland Taras Shevchenko National University, Ukraine V. M. Glushkov Institute of Cybernetics NASU, Ukraine International Centre for Information Technologies and Systems of the National Academy of Sciences of Ukraine, Ukraine
Program Commitee Chairman Oleh Mashkov
State Ecological Academy of Postgraduate Education and Natural Resources Management of Ukraine, Kyiv, Ukraine
Vice-chairmen Yuri Krak Sergii Babichev
Taras Shevchenko National University, Kyiv, Ukraine Jan Evangelista Purkyně University in Ústí nad Labem, Ústí nad Labem, Czech Republic; Kherson State University, Kherson, Ukraine
vii
viii
Organization
Members Natalia Axak Tetiana Aksenova Mikhail Alexandrov Svitlana Antoshchuk Olena Arsirii Sergii Babichev
Alexander Barmak Vitor Basto-Fernandes Juri Belikov Andrii Berko Oleg Berezkiy Oleg Bisikalo Peter Bidyuk Oksana Bihun Yevgeniy Bodyanskiy Yevheniy Burov Volodymyr Buriachok Zoran Cekerevac Sergiu Cataranciuc Mykola Dyvak Michael Emmerich Oleg Garasym Fedir Geche Sergiy Gnatyuk Vladimir Golovko Oleksii Gorokhovatskyi Aleksandr Gozhyj Natalia Grabar Klaus ten Hagen Volodymyr Hnatushenko
Kharkiv National University of Radio Electronics, Ukraine Grenoble University, France Autonomous University of Barcelona, Spain Odessa National Polytechnic University, Ukraine Odessa National Polytechnic University, Odessa, Ukraine Jan Evangelista Purkyně University in Ústí nad Labem, Ústí nad Labem, Czech Republic; Kherson State University, Kherson, Ukraine Khmelnitsky National University, Zhytomyr, Ukraine University Institute of Lisbon, Portugal Tallinn University of Technology, Estonia Lviv Polytechnic National University, Ukraine Ternopil National Economic University, Ukraine Vinnytsia National Technical University, Ukraine National Technical University of Ukraine “Ighor Sikorsky Kyiv Polytechnic Institute,” Ukraine Mathematics University of Colorado, Colorado Springs, USA Kharkiv National University of Radio Electronics, Ukraine Lviv Polytechnic National University, Ukraine Borys Grinchenko Kyiv University, Ukraine “Union – Nikola Tesla” University, Serbia Moldova State University, Kishinev, Moldova Republic Ternopil National Economic University, Ukraine Leiden Institute of Advanced Computer Science, Leiden University, the Netherlands Volvo IT, Poland Uzhhorod National University, Ukraine National Aviation University, Kyiv, Ukraine Brest State Technical University, Belarus Simon Kuznets Kharkiv National University of Economics, Ukraine Petro Mohyla Black Sea National University, Ukraine CNRS UMR 8163 STL, France University of Applied Science Zittau/Goerlitz, Germany Dnipro University of Technology, Ukraine
Organization
Viktorya Hnatushenko Volodymyr Hrytsyk Ivan Izonin Irina Ivasenko Irina Kalinina Maksat Kalimoldayev Viktor Kaplun Bekir Karlik Alexandr Khimich Volodymyr Khandetskyi Lyudmyla Kirichenko Pawel Komada Konrad Gromaszek Roman Kvyetnyy Pavel Kordik Mykola Korablyov Andrzej Kotyra Yuri Krak Vyacheslav Kharchenko Jan Krejci Evelin Krmac Victor Krylov Roman Kuc Dmitry Lande Evgeniy Lavrov Frank Lemke Vitaly Levashenko Volodymyr Lytvynenko Vasyl Lytvyn
ix
National Metallurgical Academy of Ukraine, Dnipro, Ukraine Lviv Polytechnic National University, Ukraine Lviv Polytechnic National University, Ukraine Karpenko Physico-Mechanical Institute of the NAS of Ukraine, Lviv, Ukraine Petro Mohyla Black Sea National University, Ukraine Institute of Information and Computational Technologies, Almaty, Kazakhstan Kyiv National University of Technologies and Design, Ukraine Neurosurgical Simulation Research and Training Centre, Canada Glushkov Institute of Cybernetic of NAS of Ukraine, Ukraine Oles Honchar Dnipro National University, Dnipro, Ukraine Kharkiv National University of Radio Electronics, Ukraine Lublin University of Technology, Lublin, Poland Lublin University of Technology, Lublin, Poland Vinnytsia National Technical University, Vinnytsia, Ukraine Czech Technical University in Prague, Czech Republic Kharkiv National University of Radio Electronics, Ukraine Lublin University of Technology, Poland Taras Shevchenko National University, Kyiv, Ukraine National Aerospace University “KhAI,” Kharkiv, Ukraine Jan Evangelista Purkyně University in Ústí nad Labem, Ústí nad Labem, Czech Republic University of Ljubljana, Slovenia Odessa National Polytechnic University, Ukraine Yale University, Yale, USA Institute for Information Recording of NAS of Ukraine, Kyiv, Ukraine Sumy State University, Ukraine Knowledge Miner Software, Berlin, Germany Zilinska Univerzita v Ziline, Slovak Republic Kherson National Technical University, Ukraine Lviv Polytechnic National University, Ukraine
x
Leonid Lyubchyk Igor Malets Viktor Morozov Viktor Mashkov Mykola Malyar Sergii Mashtalir Volodymyr Mashtalir Jíři Škvor Jíři Fišer Sergii Olszewski Opeyemi Olakitan Volodymyr Osypenko Sergii Pavlov Nataliya Pankratova Anatolii Pashko Dmytro Peleshko Iryna Perova Eduard Petlenkov Michael Pokojovy
Taras Rak Yuriy Rashkevych Hanna Rudakova Yuriy Romanyshyn Yuri Samokhvalov Silakari Sanjay Andrii Safonyk
Organization
National Technical University “Kharkiv Polytechnic Institute,” Ukraine Lviv State University of Life Safety, Ukraine Taras Shevchenko National University, Kyiv, Ukraine Jan Evangelista Purkyně University in Ústí nad Labem, Ústí nad Labem, Czech Republic Uzhhorod National University, Ukraine Kharkiv National University of Radio Electronics, Ukraine Kharkiv National University of Radio Electronics, Ukraine Jan Evangelista Purkyně University in Ústí nad Labem, Ústí nad Labem, Czech Republic Jan Evangelista Purkyně University in Ústí nad Labem, Ústí nad Labem, Czech Republic Taras Shevchenko National University, Ukraine Cornell University, UK Kyiv National University of Technologies and Design, Ukraine Vinnytsia National Technical University, Vinnytsia, Ukraine National Technical University of Ukraine “Ighor Sikorsky Kyiv Polytechnic Institute,” Ukraine Taras Shevchenko National University of Kyiv, Ukraine GeoGuard, Lviv, Ukraine Kharkiv National University of Radio Electronics, Ukraine Tallinn University of Technology, Estonia Karlsruher Institut für Technologie (KIT), Universität Konstanz, Mannheim Area, Germany IT Step University, Lviv, Ukraine Lviv National Polytechnic University, Lviv, Ukraine Kherson National Technical University, Ukraine Lviv Polytechnic National University, Ukraine Taras Shevchenko National University, Kyiv, Ukraine Rajiv Gandhi Technical University, Madhya Pradesh, India National University of Water and Environmental Engineering, Rivne, Ukraine
Organization
Natalia Savina Antonina Savka Galina Setlak Natalya Shakhovska Manik Sharma Ihor Shelevytsky Volodimir Sherstyuk Galyna Shcherbakova Juergen Sieck Miki Sirola Andrzej Smolarz Sergey Subbotin Vasyl Teslyuk Roman Tkachenko Vasyl Trysnyuk Ivan Tsmots Oleksii Tyshchenko
Oleksandr Trofymchuk Yuri Turbol Kristina Vassiljeva Alexey Voloshin Viktor Voloshyn Olena Vynokurova Victoria Vysotska Waldemar Wojcik Mykhaylo Yatsymirskyy Sergey Yakovlev Iryna Evseyeva Danuta Zakrzewska Elena Zaitseva Jan Zizka
xi
National University of Water and Environmental Engineering, Rivne, Ukraine Openet, Dublin, Ireland Rzeszow University of Technology, Poland Lviv Polytechnic National University, Ukraine DAV University, India Kryvyi Rih Institute of Economics, Ukraine Kherson National Technical University, Ukraine Odessa National Polytechnic University, Ukraine Humboldt-Universität zu Berlin, Germany Institute for Energy Technology, Norway Lublin University of Technology, Poland Zaporizhzhia National Technical University, Ukraine Lviv Polytechnic National University, Ukraine Lviv Polytechnic National University, Ukraine Institute of Telecommunications and Global Information Space, Kyiv, Ukraine Lviv Polytechnic National University, Ukraine Institute for Research and Applications of Fuzzy Modeling, CEIT Innovations, University of Ostrava, Czech Republic Institute of Telecommunications and Global Information Space, Kyiv, Ukraine National University of Water and Environmental Engineering, Rivne, Ukraine Tallinn University of Technology, Estonia Taras Shevchenko National University, Kyiv, Ukraine IT Step University, Lviv, Ukraine GeoGuard, Kharkiv, Ukraine Lviv Polytechnic National University, Ukraine Lublin University of Technology, Poland Institute of Information Technology, Lodz University of Technology, Poland National Aerospace University “Kharkiv Aviation Institute,” Kharkiv, Ukraine University of Newcastle, London, England Institute of Information Technology, Lodz University of Technology, Poland Zilinska Univerzita v Ziline, Slovakia Mendel University in Brno, Czech Republic
xii
Organization
Organization Commitee Chairman Yuriy Bardachov
Kherson National Technical University, Ukraine
Vice-chairmen Volodymyr Lytvynenko Yuriy Rozov
Kherson National Technical University, Ukraine Kherson National Technical University, Ukraine
Members Igor Baklan Anatoliy Batyuk Oleg Boskin Liliya Chyrun Nataliya Kornilovska Yurii Lebedenko Olena Liashenko Irina Lurje Oksana Ohnieva Viktor Peredery Svetlana Radetskaya Oleg Riznyk Polina Zhernova Svetlana Vyshemyrskaya Mariia Voronenko Maryna Zharikova Pavlo Mulesa
National Technical University of Ukraine “Ighor Sikorsky Kyiv Polytechnic Institute,” Ukraine Lviv Polytechnic National University, Ukraine Kherson National Technical University, Ukraine Polytechnic National University, Ukraine Kherson National Technical University, Ukraine Kherson National Technical University, Ukraine Kherson National Technical University, Ukraine Kherson National Technical University, Ukraine Kherson National Technical University, Ukraine Kherson National Technical University, Ukraine Kherson National Technical University, Ukraine Lviv Polytechnic National University, Ukraine Kharkiv National University of Radio Electronics, Ukraine Kherson National Technical University, Ukraine Kherson National Technical University, Ukraine Kherson National Technical University, Ukraine Uzhhorod National University, Ukraine
Contents
Analysis and Modeling of Complex Systems and Processes Financial Risk Estimation in Conditions of Stochastic Uncertainties . . . Oleksandr Trofymchuk, Peter Bidyuk, Irina Kalinina, and Aleksandr Gozhyj Numerical Modeling of Disk Dissolution in Melt During Gas Blowing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kyrylo Krasnikov Streaming Algorithm to the Decomposition of a Polyatomic Molecules Mass Spectra on the Polychlorinated Biphenyls Molecule Example . . . . Serge Olszewski, Violetta Demchenko, Eva Zaets, Volodymyr Lytvynenko, Irina Lurie, Oleg Boskin, and Sergiy Gnatyuk Method of Functional-Value Calculations of Complex Systems with Mixed Subsystems Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . Maksym Korobchynskyi, Mykhailo Slonov, Pavlo Krysiak, Myhailo Rudenko, and Oleksandr Maryliv Current State of Methods, Models, and Information Technologies of Genes Expression Profiling Extraction: A Review . . . . . . . . . . . . . . . Lyudmyla Yasinska-Damri, Ihor Liakh, Sergii Babichev, and Bohdan Durnyak A Method of Analytical Calculation of Dynamic Characteristics of Digital Adaptive Filters with Parallel-Sequential Weight Summation . . . Kostiantyn Semibalamut, Volodymyr Moldovan, Svitlana Lysenko, Maksym Topolnytskyi, and Sergiy Zhuk Simulation Modeling as a Means of Solving Professionally-Oriented Problems in Maritime Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tatyana Zaytseva, Lyudmyla Kravtsova, Oksana Tereshchenkova, and Alona Yurzhenko
3
25
39
54
69
82
94
xiii
xiv
Contents
Data Mining Methods, Models and Solutions for Big Data Cases in Telecommunication Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Nataliia Kuznietsova, Peter Bidyuk, and Maryna Kuznietsova Agile Architectural Model for Development of Time-Series Forecasting as a Service Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Illia Uzun, Ivan Lobachev, Luke Gall, and Vyacheslav Kharchenko Essential R Peak Detector Based on the Polynomial Fitting . . . . . . . . . . 148 Olga Velychko, Oleh Datsok, and Iryna Perova An Approach to Identifying and Filling Data Gaps in Machine Learning Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 Peter Bidyuk, Irina Kalinina, and Aleksandr Gozhyj Information Technology for Assessing the Situation in Energy-Active Facilities by the Operator of an Automated Control System During Data Sampling . . . . . . . . . . . . . . . . . . . . . . . . . 177 Lubomyr Sikora, Natalya Lysa, Roman Martsyshyn, and Yuliya Miyushkovych Application of Ensemble Methods of Strengthening in Search of Legal Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 Nataliya Boyko, Khrystyna Kmetyk-Podubinska, and Iryna Andrusiak Synthesis of Barker-Like Codes with Adaptation to Interference . . . . . . 201 Oleg Riznyk, Ivan Tsmots, Roman Martsyshyn, Yuliya Miyushkovych, and Yurii Kynash Models of Factors of the Design Process of Reference and Encyclopedic Book Editions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Vsevolod Senkivskyy, Iryna Pikh, Alona Kudriashova, Nataliia Senkivska, and Lyubov Tupychak Analysis of Digital Processing of the Acoustic Emission Diagnostics Informative Parameters Under Deformation Impact Conditions . . . . . . 230 Volodymyr Marasanov, Hanna Rudakova, Dmitry Stepanchikov, Oleksandr Sharko, Artem Sharko, and Tetiana Kiryushatova Solution of the Problem of Optimizing Route with Using the Risk Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 Pavlo Mamenko, Serhii Zinchenko, Vitaliy Kobets, Pavlo Nosov, and Ihor Popovych Automatic Optimal Control of a Vessel with Redundant Structure of Executive Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 Serhii Zinchenko, Oleh Tovstokoryi, Andrii Ben, Pavlo Nosov, Ihor Popovych, and Yaroslav Nahrybelnyi
Contents
xv
Practice Analysis of Effectiveness Components for the System Functioning Process: Energy Aspect . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 Victor Yarmolenko, Nataliia Burennikova, Sergii Pavlov, Vyacheslav Kavetskiy, Igor Zavgorodnii, Kostiantyn Havrysh, and Olga Pinaieva Method of Mathematical and Geoinformation Models Integration Based On Unification of the Ecological Data Formalization . . . . . . . . . . 297 Oleg Mashkov, Taras Ivashchenko, Waldemar Wójcik, Yuriy Bardachov, and Viktor Kozel Prediction of Native Protein Conformation by a Hybrid Algorithm of Clonal Selection and Differential Evolution . . . . . . . . . . . . . . . . . . . . 314 Iryna Fefelova, Andrey Fefelov, Volodymyr Lytvynenko, Oksana Ohnieva, and Saule Smailova Reduction of Training Samples in Solar Insolation Prediction Under Weather and Climatic Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Yakiv Povod, Volodymyr Sherstjuk, and Maryna Zharikova Research of Acoustic Signals Digital Processing Methods Application Efficiency for the Electromechanical System Functional Diagnostics . . . 349 Hanna Rudakova, Oksana Polyvoda, Inna Kondratieva, Vladyslav Polyvoda, Antonina Rudakova, and Yuriy Rozov Computer Simulation of Physical Processes Using Euler-Cromer Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Tatiana Goncharenko, Yuri Ivashina, and Nataliya Golovko Automated System and Domain-Specific Language for Medical Data Collection and Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 Oleksii Boiarskyi and Svitlana Popereshnyak Applying Visibility Graphs to Classify Time Series . . . . . . . . . . . . . . . . 397 Lyudmyla Kirichenko, Tamara Radivilova, and Vitalii Ryzhanov Theoretical and Applied Aspects of Decision-Making Systems Assessment of the Influencing Factors Significance in Non-destructive Testing Systems of Metals Mechanical Characteristics Based on the Bayesian Network . . . . . . . . . . . . . . . . . . . 413 Volodymyr Mirnenko, Oleksandr Mishkov, Anatolii Balanda, Vasiliy Nadraga, and Oleksandr Hryhorenko Markovian Learning Methods in Decision-Making Systems . . . . . . . . . . 423 Petro Kravets, Yevhen Burov, Vasyl Lytvyn, Victoria Vysotska, Yuriy Ryshkovets, Oksana Brodyak, and Svitlana Vyshemyrska
xvi
Contents
Fine-Tuning of the Measure for Full Reference Image Quality Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438 Oleksii Gorokhovatskyi and Olena Peredrii A Model for Assessing the Rating of Higher Education School Academic Staff Members Based on the Fuzzy Inference System . . . . . . 449 Sergii Babichev, Aleksander Spivakovsky, Serhii Omelchuk, and Vitaliy Kobets Early Revealing of Professional Burnout Predictors in Emergency Care Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464 Igor Zavgorodnii, Olha Lalymenko, Iryna Perova, Polina Zhernova, Anastasiia Kiriak, and Oleksandr Novytskyy Forming Predictive Features of Tweets for Decision-Making Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479 Bohdan M. Pavlyshenko Method for Adaptive Semantic Testing of Educational Materials Level of Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 Olexander Mazurets, Olexander Barmak, Iurii Krak, Eduard Manziuk, and Ruslan Bahrii Baseline Wander Correction of the Electrocardiogram Signals for Effective Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 Anatolii Pashko, Iurii Krak, Oleg Stelia, and Waldemar Wojcik Intellectual Information Technologies of the Resources Management in Conditions of Unstable External Environment . . . . . . . . . . . . . . . . . . 519 Marharyta Sharko, Olga Gonchar, Mykola Tkach, Anatolii Polishchuk, Nataliia Vasylenko, Mikhailo Mosin, and Natalia Petrushenko Information Technologies and Neural Network Means for Building the Complex Goal Program “Improving the Management of Intellectual Capital” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534 Anzhelika Azarova Quantitative Assessment of Forest Disturbance with C-Band SAR Data for Decision Making Support in Forest Management . . . . . . . . . . 548 Anna Kozlova, Sergey Stankevich, Mykhailo Svideniuk, and Artem Andreiev An Intelligent System for Providing Recommendations on the Web Development Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563 Iryna Yurchuk and Mykyta Kutsenko Real-Time Sensing, Reasoning and Adaptation for Computer Vision Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 Volodymyr Hrytsyk and Mariia Nazarkevych
Contents
xvii
Computational Intelligence and Inductive Modeling Spectrophotometric Method for Coagulant Determining in a Stream Based on an Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . 589 Andrii Safonyk, Maksym Mishchanchuk, and Ivanna Hrytsiuk Comparative Analysis of Normalizing Techniques Based on the Use of Classification Quality Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602 Oleksandr Mishkov, Kostiantyn Zorin, Denys Kovtoniuk, Vladyslav Dereko, and Igor Morgun Robust Recurrent Credibilistic Modification of the Gustafson Kessel Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 Yevgeniy Bodyanskiy, Alina Shafronenko, Iryna Klymova, and Vladyslav Polyvoda Tunable Activation Functions for Deep Neural Networks . . . . . . . . . . . 624 Bohdan Bilonoh, Yevgeniy Bodyanskiy, Bohdan Kolchygin, and Sergii Mashtalir Markov-Chain-Based Agents for k-Armed Bandit Problem . . . . . . . . . . 634 Vladyslav Sarnatskyi and Igor Baklan Predicting Customer Churn Using Machine Learning in IT Startups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645 Viktor Morozov, Olga Mezentseva, Anna Kolomiiets, and Maksym Proskurin Union of Fuzzy Homogeneous Classes of Objects . . . . . . . . . . . . . . . . . . 665 Dmytro Terletskyi and Sergey Yershov Neuro-Fuzzy Diagnostics Systems Based on SGTM Neural-Like Structure and T-Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685 Roman Tkachenko, Ivan Izonin, and Pavlo Tkachenko An Integral Software Solution of the SGTM Neural-Like Structures Implementation for Solving Different Data Mining Tasks . . . . . . . . . . . 696 Roman Tkachenko An Expert System Prototype for the Early Diagnosis of Pneumonia . . . 714 Mariia Voronenko, Olena Kovalchuk, Luidmyla Lytvynenko, Svitlana Vyshemyrska, and Iurii Krak Using Bayesian Networks to Estimate the Effectiveness of Innovative Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 729 Oleksandr Naumov, Mariia Voronenko, Olga Naumova, Nataliia Savina, Svitlana Vyshemyrska, Vitaliy Korniychuk, and Volodymyr Lytvynenko
xviii
Contents
Method of Transfer Deap Learning Convolutional Neural Networks for Automated Recognition Facial Expression Systems . . . . . . . . . . . . . 744 Arsirii Olena, Denys Petrosiuk, Babilunha Oksana, and Nikolenko Anatolii Development of a Smart Education System for Analysis and Prediction of Students’ Academic Performance . . . . . . . . . . . . . . . . 762 Svetlana Yaremko, Elena Kuzmina, Nataliia Savina, Dmitriy Yaremko, Vladyslav Kuzmin, and Oksana Adler Assesment Model for Domain Specific Programming Language Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776 Oleksandr Ocheretianyi and Ighor Baklan General Scheme of Modeling of Longitudinal Oscillations in Horizontal Rods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 789 Roman Tatsij, Oksana Karabyn, Oksana Chmyr, Igor Malets, and Olga Smotr Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 803
Analysis and Modeling of Complex Systems and Processes
Financial Risk Estimation in Conditions of Stochastic Uncertainties Oleksandr Trofymchuk1 , Peter Bidyuk2 , Irina Kalinina3 , and Aleksandr Gozhyj3(B) 1
2
Institute of Telecommunications and Global Information Sphere at NAS of Ukraine, Kyiv, Ukraine [email protected] National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine 3 Petro Mohyla Black Sea National University, Nikolaev, Ukraine
Abstract. The problem of modeling and forecasting possible financial loss in the form of market risk using stochastic measurements is considered. The sequence of operations directed towards risk estimation includes data preparing to model building with selected filters: exponential smoothing, optimal Kalman filter and probabilistic Bayesian filter. A short review of the possibilities for data filtering is proposed, and then some of them are selected for specific practical application to process financial data in the form of prices for selected stock instrument. After preprocessing the data is used for constructing forecasting models for the financial process itself and dynamic of its conditional variance. In the first case regression models with polynomial trend are hired, and to describe dynamic of conditional variance GARCH and EGARCH models are constructed. Further on the results of variance prediction are used for computing possible market loss hiring VaR approach. Adequacy analysis of the models constructed and back testing of risk estimates performed indicate that there is improvement of quality of the final results. Keywords: Modeling and forecasting financial loss · Exponential smoothing · Filtering · Kalman filter · Conditional volatility · Risk estimation
1
Introduction
The problem of financial risk estimation and prediction is very important for all financial organizations as well as for any other enterprise dealing with finances in their activities. Development of the methods for risk model constructing, the problems of parameter and state estimation for relevant dynamic system are in the focus of attention for many researchers all over the world. Usually the model constructing is performed in stochastic environment that is characterized
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 3–24, 2022. https://doi.org/10.1007/978-3-030-82014-5_1
4
O. Trofymchuk et al.
by specific uncertainties and requires application of special data processing procedures to reach high quality results regarding necessary model adequacy and risk estimation. Such methods are also applicable to solving similar problems in studying engineering systems, cosmological and physical studies, medical diagnostic systems, economy, biotechnology, ecology and many other directions of research [1,6]. In spite of availability substantial scientific and practical achievements in the area many researchers today continue the search of new methods for modeling dynamic processes, state and parameter estimation for the systems under study and improvement of existing ones. As an example of the methods could be mentioned digital and optimal filtering that found their practical application in data processing procedures for engineering systems since 1960s. And starting approximately from the middle of 1970s they found many other applications, among which there were processing of economic and financial data, ecology, weather forecasting, physical experiments, automated and automatic control and other information technologies of various applications [7,10,11,17]. As a rule, testing of new methods for state and parameter estimation of dynamic systems is performed in the frames of modern decision support systems that create a convenient and reliable instrument for analysis of data and expert estimates and generating appropriate alternatives on the basis of the analysis performed. Such analysis is also dealing with the problems of non-measurable states estimation using for solving the problem the measurements of other variables in conditions of influence of stochastic external disturbances and measurement errors. Availability of correlation between measurable and non-measurable variables allows for creating appropriate estimation procedures directed to estimation of non-measurable states. It should be mentioned that external disturbances and measurement errors are always available in studying real life systems. The measurement errors are induced by the imperfect features of the measurement devices, external electromagnetic noise influencing data transmission lines, and finite representation of digits by the digital-to-analog converters and computers [4,14]. Even when the data is collected manually the errors will be generated and incorporated into data by the members of the process themselves [5,25]. The stochastic disturbances also create a source of negative influence on the quality of processes state estimation. This problem also requires efforts from the researchers who work on reducing the influence of various stochastic disturbances on estimates of system (processes) output variables. The state and measurement noises are taken into consideration in the state space models that are widely used in the area of control systems synthesis. Besides technical systems such models also find application in many other areas including state estimation and forecasting in economy, finances, ecology etc. State space modeling supposes definition of a state vector that includes the variables necessary for describing behavior of the system (or processes) under study in a specific problem statement. In the econometric applications that could be stock prices, interest rates, level of GDP production, and volatility of financial processes under consideration. The measurement vector represents noisy
Financial Risk Estimation in Conditions of Uncertaintie
5
measurements linked to the state vector. It may have lower dimension due to the fact that not all the state vector components can be measured. As an example of component that cannot be measured is a volume of capital being transferred periodically into off-shore. Or it can be volume of a capital under risk that may be considered as a state vector component though not directly measured [25]. When the probabilistic data analysis methods are used for computing inference regarding current state of dynamic system it is necessary to construct at least two models: the model describing state dynamic, and the model that combines noisy measurements of states with measurement errors (i.e. equation of measurements). Such state space models should be also available for study and practical application in probabilistic form [23,24]. The problem of linear and nonlinear probabilistic filtering is in computing probabilistic inference regarding system state using available measurements. In the frames of Bayesian data analysis this is performed by approximating posterior distribution of a state vector using all available measurement information and estimates of non-measurable components. As far as the probability distribution function contains all available information regarding the system (processes) states under investigation, its estimate constitutes complete solution to the problem of state estimation, forecasting it future evolution, and support of managerial decisions [23]. Below we consider some methods of statistical and probabilistic linear and nonlinear filtering of data and some special features of their application to solving the problem of financial risk estimation using statistical data. An example will be provided illustrating the possibility for market risk estimation with making use of modern data filtering procedures including Bayesian type particle filter that finds more and more applications in solving the problems of dynamic system state estimation and forecasting. Problem Statement. The study is directed towards solving the following tasks: to consider some modern approaches to solving the problems of linear and nonlinear filtering statistical/experimental data that provide the possibility for computing optimal estimates of states; to perform analysis of particle filtering technique; to propose a new approach to data filtering, directed towards simultaneous generating optimal state estimates and distribution forecasts to be used for market risk estimation, and to provide illustrative examples of the filtering method considered.
2 2.1
Materials and Methods Some Approaches to Solving the Problems of Linear and Nonlinear Filtering
The widely used approaches to solving the problems mentioned are among others the methods shortly considered below.
6
O. Trofymchuk et al.
1. Optimal Kalman filter (KF) that provides the possibility for computing posterior distribution for linear Gaussian systems by recursive innovation of finite dimensional statistical/experimental data. The KF in its initial formulation computes optimal estimates of states in its class of linear Gaussian systems with some known constraints [2,9]. The problem of simultaneous estimation of system state and its further evolution in time is solved successfully using the methods of optimal filtering and specifically Kalman filtering technique. Today there exist several modifications of the filtering approach that provide the possibility for optimal data smoothing, computing short-term forecasts of states using their optimal estimates, as well as estimating non-measurable state vector components and possibly some parameters of mathematical model. The basic filtering equation for a free dynamic system (in the sense that control actions are not being taken into consideration) that is based upon discrete system state space model can be written as follows: ˆ (k) = Aˆ x x(k − 1) + K(k)[z(k) − HAˆ x(k − 1)], ˆ (k) is optimal estimate of state vector x(k) at the moment k; A is where x the state transition matrix (or matrix of system dynamic); z(k) is a vector of system output measurements; H is a matrix of measurement coefficients; K(k) is optimal matrix of weighting coefficients, that is computed as a result of minimizing the functional: x(k) − x(k)]T [ˆ x(k) − x(k)]}. J = min E{[ˆ K
The functional application means minimization of mathematical expectation of square state estimation errors. The value of K is determined by solving respective Riccati equation. The state estimation algorithm also provides (automatically) the possibility for estimating one-step-ahead prediction of state by the equation: ˆ (k + 1|k) = Aˆ x x(k|k). The last equation can be used for computing multi-step forecasts as follows: ˆ (k|k), ˆ (k + s|k) = As x x where s is a number of forecasting steps. Thus, the filter performs the tasks of smoothing and forecasting with taking into account such statistical uncertainties as covariance and expectation for the following two stochastic processes: external state disturbances and measurement errors. That is why the filter usage expands the data processing system with extra positive functional features directed towards fighting statistical uncertainties mentioned. Besides, adaptive version of the filter provides the possibility for real time estimation of statistical characteristics of the two random processes. These characteristics not always can be estimated a priory what leads to necessity of constructing adaptive estimation schemes.
Financial Risk Estimation in Conditions of Uncertaintie
7
2. Extended Kalman filter (EKF). EKF is applied to the problem of state estimation for nonlinear non-Gaussian processes. This is the same type of linear Kalman filter but it is applied to linearized model of the system under study with Gaussian noise and the same first and second order moments. Extended KF approximates nonlinear function (model of the system generating the data being processed) using second order Taylor expansion. However, drawback of the approach is in substitution of actual probability distribution of the data by the normal one, and the use of resulted approximate model of system dynamic may not be suitable for further use [2,9]. 3. Modified extended Kalman filter (MEKF). More complicated type of a model non-linearity is represented by dependency of continuous time state variables X(t) on possible discrete variables, D(k), k = 0, 1, 2, ..., that may exhibit non-stationary probability distributions distinguished from the continuous variable distribution. Such situations require specific problem formulations for all possible hypothesis related to possible values of discrete variables. The number of the hypothesis may grow exponentially with growing length of the discrete data sample what may finally result in very high computational expanses and not acceptable run time of filter implementation at all. To cope with such cases another modification of the filter has been proposed that supposes the use of random variable, H(k), each value of which corresponds to one of possible hypothesis [19]. Distribution of the, H(k), corresponds to the likelihood of the hypothesis selected. In the process of implementation of the MEKF all combinations of the, H(k), and D(k + 1), values are considered what results in analysis of the mixture that contains K × |D| components. Each new hypothesis is conditioning (normalized) on the newly coming measurements, Y(k +1), and thanks to Bayesian conditioning the weighting coefficients of the mixture are corrected as well parameters of the multi-dimensional Gaussians. All these procedures result in adequate models and consequently higher quality of final result – quality of state estimation, forecast and decision alternatives based upon it. 4. Filter based upon nonlinear transform of a data probability distribution or unscented transform. Implementation of the unscented Kalman filter (U-filter or UKF) is based on the principle that the set of discrete measurements can be hired for estimation of the data mean and covariance. The information necessary for estimation of mean and covariance is available in the discrete sample of measurements that found the name of sigma points. This distribution can be transformed into any other one that is required for solving specific problem statement by applying nonlinear transform to each measurement. The mean and covariance of the new distribution represent the estimates necessary for a filtering algorithm. In contrary to the EKF in which nonlinear function (model) is approximated by the linear one, the principal advantage of this approach is in complete use of available nonlinear function representing the data model. It means that there is no need to apply the formerly mentioned linearization procedure based upon differentiation of nonlinear function what is favorable for enhancing quality of estimates at the filter output. Also the filter implementation procedure is simplified due
8
O. Trofymchuk et al.
to the fact that constructing and implementing appropriate Jacobi matrix is avoided. Generally, such state estimation algorithm generates the outputs equivalent to the quality of optimal Kalman filter results for linear systems. But an advantage of the approach is that the filter is applied to nonlinear systems without application of the linearization procedure that is necessary for implementing EKF. It was shown analytically that quality of filtering in this case overcomes quality of EKF and could be compared to the quality of functioning of the second order Gaussian filter [16,20]. But the method is not restricted to the Gaussian distribution of state and measurement noises, what is also practically useful advantage of this filtering algorithm. 5. Point mass filter (PMF). In the case of PMF application a net of points is imposed on the state space that is used for recursive estimation of posterior distribution of states. This filtering procedure is suitable for processing any nonlinear and non-Gaussian processes and can represent with acceptable proximity practically any posterior probability distribution. The basic disadvantages of the approach are in high dimensionality of the distribution net in the case of high state space order, and the fact that the computational complexity of the filtering algorithm is quadratic with respect to the net dimension. This filtering procedure finds application in the so called “non-standard” cases of rarely met multidimensional distributions that require high quality results of data processing. Here quality is the main goal that is reached at the expense of high computational efforts. 6. Particle filter (PF). This filtering procedure to some extent reminds the point mass filter but it uses adaptive stochastic net that selects automatically relevant values (points) in the specific state space. But, as distinguished from PMF, the particle filtering algorithm is linear in computational complexity with respect to the dimension of the computational net. At the same time the filter exhibits the following advantages: it can be applied to practically any type of data distribution with simpler numerical implementation; and, it also generates at its output probability of some state and its amplitude. These types of Bayesian filters will be considered here in some detail together with linear optimal Kalman filter. It will be interesting to combine advantages of these filtering procedures in a single data processing algorithm. 2.2
The General Form of Bayesian State Estimation Method
The dynamics of nonlinear stochastic system can be described with discrete state space equations as follows: x(k) = f [x(k − 1), w(k − 1)],
(1)
z(k) = h[x(k), v(k)],
(2)
where (1) is equation for states; (2) is measurement equation; x(k) is the vector of state variables with non-Gaussian distribution Px(k) ; z(k) is the vector of realvalued measurements (generally the measurements can be complex numbers but
Financial Risk Estimation in Conditions of Uncertaintie
9
transformed into real values); w(k) is the vector of random external disturbances with known probability distribution Pw(k) ; v(k) is the vector of measurement noise (or measurement errors) with known probability distribution Pv(k) ; f , h are nonlinear deterministic functions; k = 0, 1, 2, ..., is discrete time. The random disturbances in Eqs. (1) and (2) are usually considered in additive form what simplifies the model parameter estimation though provides the possibility for constructing the model of high degree adequacy. When necessary the model (1), (2) can be expanded with the vector of deterministic control actions, u(k). The first measurement z(1) provides the possibility for estimating state x(1) and further on new measurements will result in estimation of future states. Introduce the following notation for state vector sequence: x(1 : k) = {x(1), x(2), ..., x(k)} that will be used further on. In the same manner other necessary variables can be represented in the state space. In the terms of conditional probability distributions the model (1), (2) can be written as follows: x(k) ∼ P [x(k)|x(k − 1)], z(k) ∼ P [z(k)|z(k − 1)]. From the point of view of Bayesian approach to data processing the problem of state estimation is in generating (estimating) posterior probability distribution P [x(k)|z(1 : k)] on the basis of the measurement sequence, z(1 : k) = {z(1), z(2), ..., z(k)}. Equation (1) represents forecasting conditional transition distribution, P [x(k)|x(k −1), z(1 : k −1)], that is based on the states for previous moments, and all available measurements starting from the first one and up to the z(1 : k − 1). Equation (2) determines the likelihood function for current measurement with known current state, P [z(k)|x(1 : k)]. The prior probability for this state can be defined as follows: P [x(k)|z(1 : k − 1), and it can be computed using Bayes theorem by the expression: P [x(k)|z(1 : k − 1)] = P [x(k)|x(k − 1), z(1 : k − 1)]P [x(k − 1)|z(1 : k − 1)]dx(k − 1).
(3)
The observation expression (2) determines the likelihood function for current measurement with known current state, P [z(k)|x(1 : k)]. On the other side, the state probability density function for the previous time interval can be determined as follows: P [x(k − 1)|z(1 : k − 1)]. On the step of correction the state estimates are computed using the distribution function of the following type: P [x(k)|z(1 : k − 1)] = cP [z(k)|x(1 : k − 1)]P [x(k)|z(1 : k − 1)],
(4)
where c is normalizing constant. The problem of filtering is in recursive estimation of the first two moments for the state vector x(k) with known measurements, z(1 : k). For some general type of distribution P (x) the problem is in estimation of mathematical expectation
10
O. Trofymchuk et al.
for any (actual) function of x(k), say, g(x)p(x) , using the Eqs. (3) and (4), and computing the integral of the type: g(x)p(x) = g(x)P (x)dx. (5) However, the integral cannot be taken in closed form for the case of general form of multidimensional distributions, and its value should be approximated using known numerical procedures [21,22]. The Eqs. (1) and (2) are often considered with additive random Gaussian components in the following form: x(k) = f [x(k − 1)] + w(k − 1)],
(6)
z(k) = h[x(k)] + v(k),
(7)
where w(k) and v(k) are random vector Gaussian processes that are represented in the simulation model by the vector variables with zero mean and covariance matrices Q(k) and R(k), respectively. The initial state x(0) is also modeled with ˆ 0 , that are independent on the both noise processes mentioned random values, x and have covariance matrix Pxx 0 . Imagine that nonlinear deterministic functions f , h and covariance matrices Q and R are stationary, i.e. their parameters do not depend on time. Now the forecasting density can be presented as follows: P [x(k)|x(k − 1)|z(1 : k − 1)] = N (x(k); f (x(k − 1)), Q),
(8)
where N (t; τ, Σ) is multidimensional Gaussian distribution that is generally determined by the expression [12,13]: 1 1 T −1 (9) N (t; τ, Σ) = exp − [t − τ ] (Σ) [t − τ ] . 2 (2π)k ||Σ|| Now Eq. (3) that determines prior probability for states can be written in the form [13]: P [x(k)|z(1 : k − 1)] = N [x(k); f (x(k − 1)), Q]P [x(k − 1)|z(1 : k − 1)]dx(k − 1).
(10)
The expected value of t for Gaussian distribution, N (t; f (τ ), Σ), can be represented by the expression [3,12,13,15]: t = tN (t; f (τ ), Σ)dt = f (τ ). (11)
Financial Risk Estimation in Conditions of Uncertaintie
11
According to Eq. (10) the state vector mathematical expectation can be written as follows: x(k)|z(1 : k − 1) = E {x(k)|z(1 : k − 1)} = x(k)P [x(k)|z(1 : k − 1)]dx(k) = x(k) N [x(k); f (x(k − 1)), Q]P [x(k − 1)|z(1 : k − 1)]dx(k − 1) dx(k) = x(k)N [x(k); f (x(k − 1)), Q]dx(k) P [x(k − 1)|z(1 : k − 1)]dx(k − 1) = f (x(k − 1))P [x(k − 1)|z(1 : k − 1)]dx(k − 1).
(12)
Equation (11) was used in Eq. (12) as an estimate for the inner integral. Now distribution of state vector for the moment k − 1 can be written with taking into consideration all available measurements as follows: ˆ (k − 1|k − 1), Pxx (k − 1|k − 1)), (13) P [x(k − 1)|z(1 : k − 1)] = N (x(k − 1); x ˆ (k − 1|k − 1) and Pxx (k − 1|k − 1) are estimates for the mean value of where x state vector and covariance matrix for x(k − 1) given z(1 : k − 1). The estimates ˆ (k|k − 1) and covariance matrix, Pxx (k|k − 1), for x(k) given for the mean x z(1 : k − 1) can be computed with Eq. (12) in the form: ˆ (k|k − 1) x (14) ˆ (k − 1|k − 1), Pxx (k − 1|k − 1))dx(k − 1), = f (x(k − 1))N (x(k − 1); x and expression for the respective covariance matrix can be written as follows: Pxx (k|k − 1) = Q ˆ (k − 1|k − 1), Pxx (k − 1|k − 1))dx(k − 1)b + f (x(k − 1))f T (x(k − 1))N (x(k − 1); x ˆ (k|k − 1)ˆ −x xT (k|k − 1).
(15)
The expected value for vector z(k) given x(k) and z(1 : k−1) can be provided in the form: z(k)|x(k), z(1 : k − 1) = E {z(k)|x(k), z(1 : k − 1)} = z(k)P [x(k)|z(1 : k − 1)]dx(k).
(16)
If we use Gaussian approximation for the distribution, P [x(k)|z(1 : k − 1)], it can be determined by the expression: ˆ (k|k − 1), Pxx (k|k − 1)). P [x(k)|z(1 : k − 1)] = N (x(k); x
(17)
12
O. Trofymchuk et al.
ˆ(k|k − 1) can be expressed via the following integral: And then estimates z ˆ(k|k − 1) = z(k)N (x(k); x ˆ (k|k − 1), Pxx (k|k − 1))dx(k) z (18) xx ˆ (k|k − 1), P (k|k − 1))dx(k). = h(x(k))N (x(k); x Now, if we express the measurement errors in the way: ez (k|k − 1) = h(x(k)) − ˆ(k|k − 1), then covariance matrix for the errors can be written as follows: z Pzz (k|k − 1) = (ez (k|k − 1))(ez (k|k − 1))T ˆ (k|k − 1), Pxx (k|k − 1))dx(k) = R + h(x(k))hT (x(k))N (x(k); x (19) ˆ(k|k − 1)ˆ −z zT (k|k − 1). In the same way the mutual covariance matrix, Pxy (k|k − 1), can be found with the expression: ˆ (k|k − 1)][ez (k|k − 1)]T Pxz (k|k − 1) = [x(k) − x ˆ (k|k − 1), Pxx (k|k − 1))dx(k) = x(k)hT (x(k))N (x(k); x (20) ˆ (k|k − 1)ˆ −x zT (k|k − 1). Generally, the Kalman filter can be applied to any dynamic system represented in state space form with additive Gaussian noises in both equations independently on availability of non-linearity. Though in such case may arise the convergence problem. Such approach provides the possibility for constructing Gaussian approximation for posterior distribution, P (x(k|k)), with mean and covariance determined via the expressions given below [13]: ˆ (k|k) = x ˆ (k|k − 1) + K(k)[z(k) − z ˆ(k|k − 1)], x xx
P
xx
(k|k) = P
zz
T
(k|k − 1) − K(k)P K (k),
(21) (22)
where optimal filter coefficient is computed via the expression: K(k) = Pxz (k|k − 1)[Pzz (k|k − 1)]−1 .
(23)
Let stress that the only approximation used in the above derivations is touching upon modeling the noises with additive Gaussian sequences; computing of ˆ (k|k), and covariance, Pxx (k|k), is performed without state vector estimates, x approximation. However, practical implementation of the filter considered above requires procedures for computing the integrals in Eqs. (14), (15) and (18)–(20) that exhibit the following form: ˆ , Pxx )dx. (24) I = g(x)N (x; x ˆ , Pxx ) is multidimensional Gaussian distribution with the vector of Here N (x; x means and covariance matrix, Pxx . There are three approximations for computing the integral (24) considered in [13]. One of them can be selected for specific practical implementation.
Financial Risk Estimation in Conditions of Uncertaintie
2.3
13
Particle Filter
One version of probabilistic Bayesian filters got the name of particle filter (PF). The problem to be solved by the filter is constructing (approximating) posterior probability density for unknown states given necessary measurements, i.e. estimation of P [x(k)|z(1 : k)]. There exist alternative particle filtering algorithms based upon pseudo random sequences generated by the Monte Carlo techniques for estimation of desirable multidimensional distributions [3,12,15]. Sequential Importance Sampling (SIS) Algorithm. The SIS algorithm is a method of recursive Bayesian filter implementation that uses Monte Carlo pseudo random sequence generation. This is basic algorithm of particle filtering. The idea of filtering is in representing desirable posterior probability density in the form of random values sequence with associated weighting coefficients that is used for computing filtered estimates. With substantial growth of a number of elements of the sequence the characteristic of the Monte Carlo application result becomes equivalent to functional description for the posterior density, and the SIS filter approaches optimal Bayesian estimator. Ns Let xi (1 : k), wi (k) i=1 be a random measure for the posterior density, P [x(1 : k)|z(1 : k)], where xi (1 : k), i = 0, 1, ..., Ns is a set of k−steps evolution trajectories for the reference points (particles) the associated nor with Ns malized weighting coefficients, wi (k), i = 0, 1, ..., Ns , Σi=1 wi (k) = 1. Here Ns is the number of particles to be hired for state estimation. Now, the posterior density at the moment k can be represented as follows: P [x(1 : k)|z(1 : k)] ≈
Ns
wi (k)δ(x(1 : k) − xi (1 : k)),
(25)
i=1
where δ(x) is Dirac δ-function, i.e. P [xi (1 : k)|z(1 : k)] ≈ wi (k). The weighting coefficients are generated according to the principle of importance sampling. The procedure can be characterized as follows. Suppose that it is necessary to generate the probability distribution, P (x) ∝ π(x) (here the symbol ” ∝ ” means proportion), where {x(·)} is desirable random process that cannot be generated from its true distribution but for which approximation π(x) can be computed. Let xi ∼ P (x), i = 0, 1, ..., Ns are realizations of random values that can be easily generated from the distribution, Q(·), that is called proposal density or importance density. Now the weighted approximation of desirable distribution P (·) as follows: Ns
wi (k)δ(x − xi ), P (x) ≈ i=1
where wi (k) ∝
π(xi ) Q(xi )
(26)
14
O. Trofymchuk et al.
is normalized weighting coefficient for i-th particle. After computing the relation, normalization of the coefficients is performed to satisfy the condition: Ns wi (k) = 1. Σi=1 Now, if realizations of the processes xi (1 : k) were generated from the distribution with proposal density, Q[x(1 : k)|z(1 : k)], then weighting coefficients in Eq. (25) will be computed according (26) as follows: wi (k) ∝
P [x(1 : k)|z(1 : k)] . Q[x(1 : k)|z(1 : k)]
(27)
In the case of sequential computations at each iteration of the generation procedure the weighted sample is generated that approximates posterior density, P [x(1 : k − 1)|z(1 : k − 1)], and then a new sample can be generated to approximate the density P [x(1 : k)|z(1 : k)]. However, if the proposal density is selected in the way that it can be represented in the form of the product: Q[x(1 : k)|z(1 : k)] = Q[x(k)|x(1 : k − 1), z(1 : k)]Q[x(1 : k − 1)|z(1 : k − 1)], then we can get the sample elements xi (1 : k) ∼ P [x(1 : k)|z(1 : k)], adding to the existing set, xi (1 : k −1) ∼ P [x(1 : k −1)|z(1 : k −1)], the new state estimate, xi (k) ∼ P [x(k)|x(1 : k − 1), z(1 : k)]. In practical applications very often only filtered estimate of the distribution, P (x(k)|z(1 : k)), is required at each step of computing instead of conditional distribution for the whole trajectory, P (x(1 : k)|z(1 : k)). That is why only this case will be considered further on. Using Bayes theorem it can be written in this case: P (x(k)|z(1 : k)) =
P (z(k)|x(k))P (x(k)|z(1 : k − 1)) . P (z(k)|z(1 : k − 1))
If the condition is satisfied, Q(x(k)|x(1 : k−1), z(1 : k)) = Q(x(k)|x(k−1), z(k)), i.e. the proposal density depends only on x(k − 1) and z(k), then it can be shown [3,12] that for repeated (recursive) estimation of weighting coefficients the following expression can be used: wi (k) ∝ wi (k − 1)
P (z(k)|xi (k))P (xi (k)|xi (k − 1)) . P (xi (k)|xi (k − 1), z(k))
(28)
The filtered posterior distribution can be approximated as follows: P (x(k)|z(1 : k)) ≈
Ns
wi (k)δ(x(k) − xi (k)).
(29)
i=1
It should be stressed that the weighting coefficients, wi (k), are to be normalNs wi (k) = 1. ized in such a manner that Σi=1 Selection of the proposal density is one of the most important points of the design procedure for particle filter. Possible ways of its selection as well as their
Financial Risk Estimation in Conditions of Uncertaintie
15
advantages and drawbacks are considered in [3,12]. It is important, for example, to provide low variance for the weighting coefficients, wi (k). Rather often as proposal density is hired prior distribution for data: Q(x(k)|xi (k − 1), z(k)) = P (x(k)|xi (k − 1)),
(30)
what creates quite convenient choice. In this case (28) is simplified to the form: wi (k) ∝ wi (k − 1)P (z(k)|xi (k)).
(31)
However, such choice for the proposal density is not the best for all problems. Basic Algorithm. Now formulate the algorithm for sequential importance sampling. The elements, xi (1 : 1), in the weighted sample on the first step i Ns x (1 : 1), 1/Ns i=1 are generated from initial distribution P (x(1)). As far as this distribution is actual, no correction of the values is required and all weighting coefficients should have equal values, i.e.: wi (1) = 1/Ns . If we have weighted sample on the step, (k − 1), then the procedure for generating weighted sample on the step, k, can be represented by the pseudocode: Algorithm 1: SIS Particle Filter Ns Ns i i xi (k − 1), wi (k − 1) i=1 , z(k) x (k), w (k) i=1 =SIS FOR – generate xi (k) ∼ q(x(k)|xi (k − 1), z(k)). – assign the particle xi (k) weight wi (k) according to (28). END FOR For this and all others filtering algorithms the posterior distribution can be approximated via (29), and an estimate for conditional mathematical expectation of state, x(k), is determined as follows: x ˆ(k) =
Ns
wi (k)xi (k).
(32)
i=1
Repeated Sampling of Particles. The SIS filter realization often leads to the problem of weighting coefficients degeneracy when after some number of iterations all the coefficients, but one, accept negligibly small weights. As far as variance of weighting coefficients grows in time it is impossible to avoid the phenomenon [3,8,12]. Such degeneracy is provoked by the fact that substantial part of computations is spent on updating of the particles that make almost zero influence on the approximated distribution, P (x(k)|z(1 : k)). One of approaches to reducing the degeneracy effect is re-sampling (repeated sampling) of particles. The main idea of the re-sampling is in deleting of the particles having small weights and focusing on the particles with large weights.
16
O. Trofymchuk et al.
∗ Ns On this step a new set of random values is generated, xi (k) i=1 , using approximate discrete representation of the distribution, P (x(k)|z(1 : k)), determined ∗ by (29), i.e. P xi (k) = xj (k) = wj (k). The numbers generated this way create the sequence of independent identically distributed (i.i.d.) random numbers from distribution (29) with weighting coefficients, wi (k) = 1/Ns . Pseudo-code of the re-sampling procedure is given below. The procedure is called systematic re-sampling; it is rather simple from computational point of view, and also provides for saving indices for each element of a new sample its index from previous sample for future use. Algorithm 2: Re-sampling Algorithm ∗ Ns j j j Ns xi (k), wi (k) i=1 x (k), w (k), i j=1 = RE-SAMPLE Initialize distribution function (DF): c(1) = 0 FOR i = 2, Ns – Construct DF: c(i) = c(i − 1) + wi (k). END FOR Start DF from beginning: i = 1 Generate initial point: u(1) ∼ U [0, Ns−1 ]. FOR j = 1, Ns – – – – – – –
Move along DF: u(j) = u(1) + Ns−1 (j − 1). WHILE u(j) c(i) i=i+1 END WHILE ∗ Assign new value: xj (k) = xi (k) Assign weight: wj (k) = Ns−1 Assign basic index: ij = i
END FOR The re-sampling procedure has the following drawbacks: (1) it reduces the possibilities for parallel computing; (2) the particles with large weights can be accepted for many times what may results in diversity of sample. This problem is also known as sample impoverishment which is especially urgent in the case of small deviations of the process generated. Then all the particles may converge to a single particle in several iterations. Sequential Importance Sampling with Resampling (SISR) Filter. This filter is based upon sequential importance sampling with subsequent resampling. Actually this is Monte Carlo procedure that can be applied to solving the problems of recursive Bayesian filtering. The restrictions imposed on its application are very weak. The functions f(·, ·) and h(·, ·) in (1) and (2) should be known; also it should be possible to generate pseudorandom sequences from the noise distribution, P (v(k − 1)), and prior distribution, P (x(k)|x(k − 1)), as well as determine the values of density distribution, P (z(k)|x(k)), in definite points with correctness at least to a common constant.
Financial Risk Estimation in Conditions of Uncertaintie
17
The SISR algorithm can be derived from the SIS algorithm provided the respective correct choice of the following elements: – proposal density, Q(x(k)|xi (k − 1), z(k)), that can be replaced by the prior distribution, P (x(k)|xi (k − 1)); – the re-sampling step is accomplished at each moment of time. Such choice of the proposal density substantiates the necessity of sampling the realizations from P (x(k)|xi (k−1)). The realization xi (k) ∼ P (x(k)|xi (k−1)) can be performed if we first generate the noise, v i (k − 1) ∼ P (v(k − 1)), and then compute xi (k) = f (xi (k − 1), v i (k − 1)). For this special choice of the proposal density the weight updating expression takes the form (31). However, taking into consideration that resampling is performed at each moment of time, we have, wi (k − 1) = 1/Ns ∀i, and then wi (k) ∝ P (z(k)|xi (k)).
(33)
The weights given by (33) are normalized before re-sampling phase; the pseudo-code of the algorithm is provided below. Algorithm 3: SIR Particle Filter Ns Ns xi (k), wi (k) i=1 , z(k) xi (k), wi (k) i=1 = SIR FOR i = 1, Ns – Generate xi (k) ∼ p(x(k)|xi (k − 1)). – Compute wi (k) = p(z(k)|xi (k)). END FOR Compute total weight: t =
Ns
wi (k)
i=1
FOR i = 1, Ns – Normalize i-th weight: wi (k) = t−1 wi (k). END FOR Perform re-sampling using algorithm 2 (Re-sampling Algorithm): Ns Ns xi (k), wi (k) i=1 – xi (k), wi (k), − i=1 = RE-SAMPLE
3
Risk Estimation in Stochastic Environment
Financial processes like stock prices, inflation, GDP production, analysis of market relations and practically all others exhibit substantial dynamics of evolution in conditions of influence often very high external stochastic disturbances. The disturbances are corrupting the states of the processes under study what results in higher errors of forecasting, and lower quality of respective management decisions. Another source of negative stochastic influence to development of financial
18
O. Trofymchuk et al.
processes is created by measurement errors that are always available due to inefficient measurement devices, errors in transmission lines, errors of data collecting staff etc. That is why it is always reasonable apply to data appropriate filtering techniques before constructing models, computing forecasts, estimating risk and generating alternative decisions (alternatives). Mathematically this problem can be formulated as follows: on the time interval of the process studying statistical data characterize development of nonlinear non-stationary processes (in finances) with arbitrary probabilistic distribution, {y(k)} ∼ Dist(μ(k), σy2 (k)), k = 1, ..., N, μ(k) = const, where μ(k) is time varying mean; σy2 (k) = const is variance of the process. The statistical data parameters are subject to the following restrictions: μ(k) < ∞; 0 < σy2 (k) < ∞; on the interval of studying. It is necessary to construct mathematical models for the process mentioned of the general structure: h(k) = F [z(k), x(k), θ, w(k), ε(k)], where F [·] is generally nonlinear function; h(k) is conditional variance; x(k) is dependent state variable; z(k) is measurement of state; θ is vector of model parameters; w(k), ε(k) are stochastic processes induced by external disturbances and measurement errors, respectively. It is necessary to construct the process variance model including its structure and parameter estimation. As far as different types of filters create different effects on data it will be better apply to statistical data combined filtering procedures capable to produce desirable smoothing effects. In the system for financial risk estimation it was applied combined filtering based upon digital, optimal and probabilistic filter of Bayesian type. The system layout for financial data processing is shown in Fig. 1. The data and knowledge necessary for its further processing are collected and stored in the data and knowledge base (DKB). The digital filter is used in the form of exponential smoothing; it prepares the data to state space model constructing necessary for application of optimal Kalman filter. The state space model is applied in its simple form of scalar random walk (for scalar processes): x (k) = Ax (k − 1) + w(k),
(34)
z (k) = x (k) + v(k),
(35)
which is suitable for state estimation of many financial processes. The only purpose of the model in this case is to compute optimal state estimate, x ˆ (k), using linear Kalman filter algorithm. Here in Eq. (34), z (k) = z(k) − z(k − 1), i.e. first difference for measurements, {z(k)}; A = 1 – state transition matrix is scalar when scalar process is considered. Variances for the random process, {w(k)}, and {v(k)}, are computed with adaptive version of KF described in [25]. Trend (conditional mathematical expectation) for the financial process being studied is estimated with polynomial models that are usually quite suitable for solving the task including the trends of high order. Then this estimate of the process produced by KF is used to construct acceptable model for variance dynamics. The probabilistic filter generates predictive distribution for variance that is necessary for market risk estimation. The distribution is further used for estimation of lower and upper bounds of possible financial loss at the next time step using Value-at-risk (VaR) approach.
Financial Risk Estimation in Conditions of Uncertaintie
19
Fig. 1. Application of filters in data processing system
Acceptable results for modeling variance dynamics are usually achieved (in our computational experiments) with generalized autoregressive conditionally heteroscedastic model (GARCH(p, q)): h(k) = β0 +
p
βi ε (k − i) + 2
i=1
q
αi h(k − i) + ε1 (k),
i=1
where α, β ≥ 0 (to avoid negative values of conditional variance); h(k) is conditional variance necessary for risk estimation. Another acceptable choice is exponential GARCH (EGARCH) model: p
p
q
|ε(k − i)| ε(k − i)
+ + αi γi βi log(h(k − i)), (36) log(h(k)) = α0 + h(k − i) h(k − i) i=1 i=1 i=1 that is asymmetric function regarding stochastic process, {ε(k)}, influencing volatility. The last equation is more adequate from the point of view that, ε(k), is included into RHS both as absolute value and actual two-sign process. Tables 1, 2 and 3 show an example of forecasting the process of price evolution for precious metals. Table 4 shows the results of estimation models of dynamic for conditional volatility. And Table 5 illustrates results of back testing for estimation of market risk. Here the case is considered when filtering procedure was applied (and not applied) to preliminary data processing – smoothing. The purpose of using the
20
O. Trofymchuk et al.
filters was to perform data smoothing (suppressing noisy high frequency components) and this way prepare it for further modeling. Table 1. Models and forecasts quality without filter application Model type
Model quality Forecast quality 2 R2 e (k) DW MSE MAE MAPE Theil
AR(1)
0.99 26655.77 2.21 49.93 43.57 8.49
0.047
AR(1,4)
0.99 25487.25 2.18 49.12 40.18 8.28
0.046
AR(1) + 1st order trend 0.99 25391.39 2.13 34.31 24.26 4.31
0.030
AR(1) + 4th order trend 0.99 25088.74 2.11 24.89 18.32 3.05
0.022
It follows from the Table 1 that MAPE for AR(1) + 4th order trend and Theil coefficient are generally acceptable for generating short-term forecasts with the model. Table 2. Models adequacy and forecasts quality with exponential smoothing application Model type
Model quality Forecast quality 2 e (k) DW MSE MAE MAPE Theil R2
AR(1)
0.99 23355.54 2.01 44.80 39.65 7.44
0.036
AR(1,4)
0.99 24132.15 2.08 46.45 38.34 6.78
0.034
AR(1) + 1st order trend 0.99 23861.65 2.07 31.07 22.08 3.12
0.027
th
AR(1) + 4
order trend 0.99 21887.54 2.05 21.24 13.13 3.13
0.017
Here also MAPE for AR(1) + 1st order trend and Theil coefficient show that this model is generally acceptable for short-term forecasting of the process being modeled. The exponential smoothing played positive role regarding preliminary data processing. The MAPE and the Theil coefficient for AR(1) + 4th order trend show that this model is generally good for short-term forecasting. Here Kalman filter also played positive role regarding data preparing for modeling what is supported by respective statistical quality parameters. The data for precious metal prices exhibit heteroscedastic process with time varying conditional variance. The variance is one of the key parameters that are used in appropriate rules for trading operations. So, it is necessary construct appropriate short-term forecasting models. To solve the problem GARCH and EGARCH models were used; trend of the process is rather sophisticated (high order process), and is described with polynomial.
Financial Risk Estimation in Conditions of Uncertaintie
21
Table 3. Models and forecasts quality with application of Kalman filter Model type
Model quality Forecast quality 2 e (k) DW MSE MAE MAPE Theil R2
AR(1)
0.99 24335.12 2.13 50.08 40.12 7.98
0.040
AR(1,4)
0.99 24453.1
2.12 47.15 39.56 7.68
0.042
order trend 0.99 25061.08 2.10 34.07 23.26 3.55
0.033
AR(1) + 4th order trend 0.99 23881.14 2.07 24.10 16.58 3.05
0.022
st
AR(1) + 1
Table 4. Results of models estimation for conditional volatility Model type
Model quality 2 e (k) DW R2
GARCH (1,7)
0.99 153646
0.115 973.5 -
510.6
0.113
GARCH (1,10)
0.99 102122
0.169 461.7 -
207.3
0.081
GARCH (1,15)
0.99 80378
0.329 409.3 -
117.6
0.058
0.435 64.8
7.15
0.023
EGARCH (1, 7) 0.99 45023
Forecast quality MSE MAE MAPE Theil
-
The best model constructed for conditional variance was EGARCH (1, 7). The value of MAPE = 7.15% achieved comprises good result regarding shortterm forecasting the conditional variance. Using the variance forecasts market risk can be estimated for some specific problem statement. The market risk (possible market loss) can be estimated using known Value-at-Risk (VaR) methodology as follows [18]: √ V aRm (k) = ασ(k)OP (k) N , where α is quintile of confidence interval (usually for normal distribution); σ(k) is financial process volatility that is forecasted by the model of conditional variance dynamic; OP (k) is respective open position regarding financial operation being analyzed; k = 0, 1, 2, ..., is discrete time; N is period of forecasting. Actually V aRm (k), provides the possibility for calculating upper bound for possible loss at the moment, k. Another experiment was performed regarding forecasting stock prices using combined (linear + nonlinear) model. The combined model generated the results, based upon application of Kalman filtering, linear regression and nonlinear logit models. Table 5 shows results of back-testing for estimating market risk. The best results of back-testing were achieved with the use of conditional variance forecasts produced by the EGARCH(1,7) model with percentage of correct losses forecast about 92,09%. Acceptable result was achieved with application of GARCH(1,15) model; in this case achieved percentage of correct loss forecasts was about 88,96%. The statistical quality characteristics of the forecasts achieved show high quality of the forecasts and possibility of their use in subsequent steps of decision making.
22
O. Trofymchuk et al. Table 5. Back testing results for market risk estimation Model type
GARCH (1,7)
4
Back testing 95% Number of incorrect predictions
results for possible market losses 97% % of Number of % of correct incorrect correct forecasts predictions forecasts
99% Number of incorrect predictions
% of correct forecasts
35
91,86%
36
90,65%
29
92,089%
GARCH (1,10) 48
85,13%
45
87,07%
40
88,96%
GARCH (1,15) 56
77,25%
51
78,98%
47
82,45%
EGARCH (1,7) 62
79,01%
64
78,76%
58
78,92%
Conclusions
A short review of Kalman and probabilistic filtering methods provided in the article shows that today there exist the filtering techniques suitable for processing linear and nonlinear non-stationary process. The method of processing statistical financial data using a set of filters is proposed that includes exponential smoothing, Kalman and particle filter in general form. The purpose of hiring various filtering techniques is to appropriately prepare statistical data to model constructing and its further use for forecasting selected financial process itself and its variance. As far as most of modern financial process are nonlinear and non-stationary (heteroscedastic) after filtering the data it is used for constructing the model describing the process itself and dynamic of the process variance and consequently its volatility. The volatility has multiple applications including constructing the trading rules and computing possible market loss (financial market risk). The computational experiments performed show that generally the approach considered in the study is correct because it provides the possibility for enhancement model adequacy and improvement of risk estimates. Besides, the probabilistic model can be used for generating forecasting density what provides the possibility for estimating minimum and maximum values for financial risk. The results achieved can be used for constructing specialized decision support system for financial risk estimation and forecasting. The system can be extended to estimation of credit risk that is closely related to the market one. Further studies in this direction should be carried out directed towards refinement of preliminary data processing procedures and systemic methodology of model constructing. Possible applications should be considered from the point of view further improvement of final results regarding risk estimation, forecasting and appropriate managerial decision making.
References 1. Anderson, B.D., Moore, J.: Optimal Filtering. Prentice Hall, Inc., Englewood Cliffs (1979) 2. Anderson, J.: An ensemble adjustment kalman filter for data assimilation for data assimilation. Monthly Weap. Rev. 129, 2284–2903 (2001). https://doi.org/ 10.1680/jmacr.17.00445
Financial Risk Estimation in Conditions of Uncertaintie
23
3. Arulampalam, M., Maskell, S., Gordon, N., Clapp, T.: A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking. IEEE Trans. Signal Process. 50, 174–188 (2002) 4. Babichev, S., Durnyak, B., Zhydetskyy, V., Pikh, I., Senkivskyy, V.: Application of optics density-based clustering algorithm using inductive methods of complex system analysis. In: IEEE 2019 14th International Scientific and Technical Conference on Computer Sciences and InformationTechnologies, CSIT 2019 - Proceedings, pp. 169–172 (2019). https://doi.org/10.1109/STC-CSIT.2019.8929869 ˇ 5. Babichev, S., Skvor, J.: Technique of gene expression profiles extraction based on the complex use of clustering and classification methods. Diagnostics 10(8), art. no. 584 (2020). https://doi.org/10.3390/diagnostics10080584 6. Bidyuk, P., Romanenko, V., Tymoshchuk, O.: Time Series Analysis. Kyiv: NTUU “Igor Sikorsky KPI” (2011) 7. Chui, C., Chen, G.: Kalman Filtering with Real-Time Applications. Springer, Berlin (2017). https://doi.org/10.1007/978-3-662-02508-6 8. Fox, D., Burgard, W., Dellaert, F., Thrun, S.: Monte carlo localization: efficient position estimation for mobile robots. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence, pp. 343–349 (1999) 9. Gibbs, B.: Advanced Kalman Filtering, Least-Squares and Modeling. John Wiley and Sons, Inc., Hoboken (2011) 10. Gozhyj, A., Kalinina, I., Gozhyj, V., Danilov, V.: Approach for modeling search web-services based on color petri nets. In: Babichev, S., Peleshko, D., Vynokurova, O. (eds.) DSMP 2020. CCIS, vol. 1158, pp. 525–538. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61656-4 35 11. Gozhyj, A., Kalinina, I., Vysotska, V., Gozhyj, V.: Web resources management method based on intelligent technologies. In: Advances in Intelligent Systems and Computing, vol. 871, pp. 206–221. Springer, Heidelberg (2019). https://doi.org/ 10.1007/978-3-030-01069-00 12. Gustafsson, F.: Particle filter theory and practice with positioning applications. IEEE Aerosp. Electron. Syst. Mag. 25, 53–82 (2010) 13. Haug, A.: A Tutorial on Bayesian Estimation and Tracking Techniques Applicable to Nonlinear and Non-Gaussian Processes. McLean, Virginia (2005) 14. Haykin, S.: Adaptive Filtering Theory. Prentice Hall, Upper Saddle River (2007) 15. Ito, K., Xiong, K.: Gaussian filters for nonlinear filtering problems. IEEE Trans. Autom. Controle 45, 910–927 (2000) 16. Julier, S., Uhlmann, J.: Unscented filtering and nonlinear estimation. Proc. IEEE 22, 401–422 (2004) 17. Kay, S.: Fundamentals of Statistical Signal Processing: Estimation Theorys. Prentice Hall, Upper Saddle River (1993) 18. Kuznietsova, N., Bidyuk, P.: Theory and Practice of Financial Risks Analysis: Systemic Approach. NTUU “Igor Sikorsky KPI”, Kyiv (2020) 19. Lerner, U., Parr, R., Koller, D., Biswas, G.: Bayesian fault detection and diagnosis in dynamic systems. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence (AAAI-00), Austin, Texas (USA), pp. 531–537 (2000) 20. Luo, X., Moroz, L.: Ensemble kalman filters with the unscented transform. In: Proceedings of the Oxford- Man Institute of Quantitative Finance, Oxford, UK, pp. 1–33 (2018)
24
O. Trofymchuk et al.
21. Menegaz, H., Ishihara, J., Borges, G., Vargas, A.: A systematization of the unscented kalman filter theory. IEEE Trans. Autom. Control 60, 2583–2598 (2015). https://doi.org/10.1109/TAC.2015.2404511 22. Petersen, I., Savkin, A.: Robust Kalman Filtering for Signals and Systems with Large Uncertainties. Birkhauser, Boston (1999) 23. Pole, A., West, M., Harrison, J.: Applied Bayesian Forecasting and Time Series Analysis. Chapman and Hall/CRC, Boca Raton (2000) 24. Press, S.: Subjective and Objective Bayesian Statistics. John Wiley and Sons, Inc., Hoboken (2003) 25. Zgurovsky, M., Podladchikov, V.: Analytical Methods of Kalman Filtering. Naukova Dumka, Kyiv (1997)
Numerical Modeling of Disk Dissolution in Melt During Gas Blowing Kyrylo Krasnikov(B) Dniprovskyi State Technical University, Kamianske, Ukraine kir [email protected]
Abstract. The work is devoted to a mathematical and computer modeling of complex hydrodynamic and mixing phenomena during gas blowing in steelmaking ladle equipped with an axial dissoluble disk. Obtained figures show that a time-dependent shape of the dissolving barrier significantly changes fluid flows influencing speed near top surface of the melt and near ladle wall. The former speeds can move slag with appearing of unnecessary “eye” in it and the latter damages ladle’s lining. The ladle represented by cylinder and the melt – by multicomponent continuum according to the Eulerian-Eulerian approach. Also it is considered the multithreading implementation of the computer algorithm. It has high scalability of calculation performance when more CPU cores are occupied as shown on the corresponding chart. Presented figures show the concentration of gas and addition at the beginning, middle and end of the disk dissolution. Comparison of numerical model results shows good correspondence with data of physical modeling presented in scientific literature. The developed model is proposed to conduct series of experiments with the aim of ladle treatment optimization. Keywords: Immersed body dissolution · Navier-stokes equations Multithreading computation · Secondary steelmaking
1
·
Introduction
In the secondary metallurgy lump additions are added into molten steel to improve its quality by chemical modification. Also the steel quality is influenced by non-metallic inclusions. An addition homogeneity of the melt is an important for equality of mechanical properties of steel products. The argon blowing is used to increase the homogeneity and to intensify the mixing of a chemical addition. One of the options for argon injection used today is an inserting a tube in the melt from top of the ladle along its axis. The scientists in their works [4,6,8,12] propose to use a barrier (mounted on the tube) above gas plume to scatter gas bubbles in the melt and to modify its chemical composition at the same time.
2
Problem Statement
There are some problems such as high temperature and opaqueness of melt as well as expensiveness of laboratory or plant experiments. They make it difficult to c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 25–38, 2022. https://doi.org/10.1007/978-3-030-82014-5_2
26
K. Krasnikov
investigate details of the process. So aim of the work is to present a mathematical model with its computer implementation and conduct a numerical experiment in conditions based on real ones. Duration of disk dissolution with shape changing needs to be predicted, because it significantly influences the flows of fluid. Also it’s interesting to measure computing performance of proposed computer model.
3
Literature Review
In their patent [4] authors present a simple solid axial barrier in form of a disk. In [12] they propose a dissoluble axial filtering disk, made from addition materials, with holes to control a shape of the filter during the process. An advantage of the method, presented in [12], is that a chemical addition in the form of thin cylinder (axial disk) can be simply mounted on the tube for a dissolving at an optimal depth, where it is surrounded by moving streams of the melt. This interesting method will be investigated in the current paper. Level set method is used by scientists [1] to model melting of complex-shaped body, which moves at space. Unsymmetrical change of body’s shape also occurs in the present process of a disk dissolution. Authors of course [2] as well as monograph [3] describe fundamental details about level set method with illustrations and examples of its application to bifluid mixture, structural optimization, image segmentation, flame propagation. Mentioned in work the Level Set Hamilton-Jacobi equation can be used in the present research, because velocity of disk boundary moving is oriented along normal to surface. Interesting and complex phenomena of supercooled material unstable growth can be modeled by level set method as presented in book [11]. A numerical discretization of level set equation also is considered with peculiarities regarding to flame burning and motion of its front. Also there are researches of freezing and melting time of cylindrical additive during gas blowing [14]. Authors consider moving boundary between solid and melted phase taking into account heat transfer coefficient. It is presented a table of thermo-physical parameters for cast iron, slag, as well as for nickel and ferromanganese, which used by scientist to validate the model. In the work [10] it is presented a comparison of momentum equations and turbulence models for multiphase melt flow: VOF, Eulerian-Eulerian, EulerianLagrangian, quasi-single-phase model. Despite of latter model simplicity by avoiding calculation of each bubble motion it has a good correspondence with experiments. Authors present results of plant experiments regarding to bubble diameter distribution in gas plume starting near blowing plug and ending at top surface of fluid. Bubbles are bigger at the surface as expected. The mathematical model significantly decreases cost of tenth of experiments comparing to industrial or laboratory ones. The problem addressed in a present article is mathematical modeling of melt (fluid-gas system) hydrodynamics with an axial disk dissolution taking into account a change of disk shape, which was not considered in the previous works.
Numerical Modeling of Disk Dissolution
4
27
Mathematical Model
A melt is considered as continuum with four interpenetrated phases: fluid, gas, addition, temperature. A main phase is fluid. The model considers disk dissolution as moving boundary problem like melting chunk of ice with difference that reason for the movement is concentration gradient instead of temperature one. 4.1
Assumptions
An idealization of the real process is made using following assumptions about geometry (Fig. 1) and physical processes inside molten steel:
Fig. 1. Axial cross section of ladle. Computational domain is space occupied by liquid and filter (disk)
28
K. Krasnikov
1. Considered ladle and axial disk have a cylindrical shape. The latter has small holes to increase its dissolution rate; 2. Top surface of the melt is flat and its position is fixed; 3. An axial symmetry can simplify mathematical model using two spatial dimensions (axial and radial); 4. Molten steel is a viscous Newtonian incompressible liquid; 5. Melt has a constant temperature if it is modeled by water (cold modeling) or a variable one for a hot modeling (influence of temperature gradient on dynamics is negligible comparing to such influence of bubbles); 6. Gas bubbles affect only vertical component of velocity field, have a constant speed of flotation and are considered as continuum, which interpenetrates the melt. 4.2
Equations
There are one vector and three scalar variable fields, which describe state of the system at given time and space occupied by melt and axial disk: velocity v, gas fraction α, concentration of chemical addition c, and temperature T. Initially melt’s space Ω is divided on two regions: the Φ ∈ Ω is occupied by disk with addition and the remaining one – liquid without addition: 1 x ∈ Φ, Ψ (t = 0, x) = (1) −1 x∈ / Φ. where Ψ is the level set function, which is used to determine disk boundary. An evolution of each field is predicted using conservation laws of physics. Melt dynamics is defined by Navier-Stokes equations with buoyancy term (the last) according to the Boussinesq approximation: ∂v + v · ∇v = νe Δv − ∇P − αg ∂t
(2)
∇ · v = 0
(3)
where νe is the effective kinematic viscosity; P is the kinematic pressure; g is the gravitational acceleration. Evolution of gas fraction α, addition concentration c and temperature T is defined by convection-diffusion equations: ∂α + ∇ · [α(v + vf )] = Dα ∇2 α + Sα ∂t q 300 V Tm ∂T + ∇ · (T v ) = DT ∇2 T Cρ ∂t Sα =
∂c + ∇ · (cv ) = k∇2 c ∂t
(4) (5) (6) (7)
Numerical Modeling of Disk Dissolution
⎧ ⎪ ⎨0 k= β ⎪ ⎩ Dc
x ∈ Φ, x ∈ ∂Φ, x∈ / Φ.
29
(8)
∂Ψ = F |∇Ψ | (9) ∂t where vf is the speed of gas flotation, which can be constant in assumption of an instant reaching of its maximum value; q is the consumption of a gas from submerged lance; V is the volume of space near lance end, where bubbles form; Tm is the temperature of melt; C, ρ are the heat capacity and density of melt or disk respectively; Dα , Dc , and DT are the effective coefficients of diffusion; β is the defines dissolution speed like heat transfer coefficient in the Newton’s law of cooling. Equations (1)–(6) are solved at a liquid region only, where c < 0. In Φ, where c > 0, there is heat diffusion (Eq. (6) without convective term) too. Like in the famous level set method concentration field has negative values in fluid region and positive ones in Φ. Boundary ∂Φ is implicitly defined by the isosurface c = 0. While the cylindrical coordinate system fits geometry of the ladle, an axial symmetry allows avoiding angle component in the final equations:
1 ∂ ∂u ∂u ∂ 2 u ∂P ∂u ∂u u +u +w =ν r − 2+ 2 − (10) ∂t ∂r ∂z r ∂r ∂r r ∂z ∂r
1 ∂ ∂w ∂w ∂w ∂P ∂w ∂2w +u +w =ν − αg (11) − r + ∂t ∂r ∂z r ∂r ∂r ∂z 2 ∂z 1 ∂ ∂w (ru) + =0 r ∂r ∂z
1 ∂ ∂c ∂ ∂ ∂c ∂2c + (uc) + (wc) = k r + 2 ∂t ∂r ∂z r ∂r ∂r ∂z
(12) (13)
where u is a radial component of velocity, w is the third component of the velocity, r is the radial coordinate, z is the height coordinate, c is the addition concentration (other fields – gas, temperature – have a similar to c evolution equation). 4.3
Boundary Conditions
System of equations is complemented with boundary conditions according to physical meaning. For the concentration on ladle inner surface and melt’s top surface, and at z-axis a total isolation is provided: (14) n · ∇c = 0 S
For a velocity field on solid surfaces, top surface and at z-axis impermeability and slip conditions are set: (15) v⊥ = 0, n · ∇v|| = 0 S
S
30
K. Krasnikov
For a gas fraction on solid surfaces and at z-axis isolation is supplied. On the top surface G – zero gas concentration: n · ∇α = 0 (16) S
α = 0
(17)
G
Temperature losses through ladle wall and top surface of melt (even with slag layer) are unavoidable, on z-axis – isolation:
4
4 (18) n · ∇T = h Tamb − T + εσ Tamb − T4 S
n · ∇T
z−axis
=0
(19)
where cm is the concentration in melt; k, h are the respectively mass and heat transfer coefficients; Tamb is the ambient temperature. 4.4
Details of the Model Application and Implementation
A variable spatial step can save computational resources. For example, gaps in filter have a small diameter, that’s why a spatial step needs to be small enough to represent them; however in other regions it can be larger. Constants like inverses of spatial or time step are computed only once at start of the calculation. Modern processors allow a computer program to use multiple threads to parallelize and accelerate a computation of the mathematical model. The central difference method is highly parallelizable and gives good overall efficiency with a multicore CPU. After the each stage of computation, there is a synchronization stop with a counter of waiting threads. At the stop each thread modifies the counter and waits until the others reach it. A std::mutex protects the counter from simultaneous writing by concurrent threads. The last thread resets the counter and does a very small preparation of data for the next stage, thus a critical (by means of data race) section of code is negligible in terms of CPU’s core utilization. In this simple case deadlocks are successfully avoided. The application uses the standard C++ containers, especially for an array of floats – std::vector, which leads to a pointer-free code and helps avoiding memory leaks. The block-scheme of algorithm is presented on the Fig. 2. There are four phases on the figure above: 1. Setup phase uses the main thread to read experiment parameters from user interface (Fig. 3), initialize vectors and spawn computation threads as well as thread for the storage output; 2. Computing is the most resource-consuming phase, which calculates new vectors using previous ones. Each thread computes integral numbers such as maximum, minimum and average values for its local region of vector. Another important value is a degree of addition mixing. Since overall mass of addition
Numerical Modeling of Disk Dissolution
31
Fig. 2. The conceptual four phases of data processing in the multithreaded software. A cyan color corresponds to the calculation threads
is constant its average concentration inside melt is always larger than zero. So to determine level of addition’s homogeneity the coefficient of concentration variation can be used like others researchers do [9]. The phase ends when all threads complete final iteration; 3. Snapshot phase is needed to calculate the integral numbers, mentioned above, for a whole vector using the results obtained from each thread. The phase is processed at user-selected number of times per second and the computed data is copied to the concurrent queue for the export phase. Such doublestep calculation gives performance gain from multithreading and avoiding overloading of the first thread; 4. Export phase is done by two threads: main and output. The former thread visualizes user interface and a current state of the system using 3D graphics. The latter thread saves vectors with other information to storage, for example, “cloud” or a simple hard disk. That additional thread allows computation phase to continue and be independent of output lags.
4.5
Adequacy Checking
Physical modeling was conducted by the group of scientists [5,7,13] using transparent cylindrical vessel with a water. Air is used to simulate gas blowing. Those scientists present photos made by fixed camera from a side point of view. A numerical experiment is conducted according to the physical one in a laboratory [13]. Height of the ladle is 44 cm, fluid height – 38 cm, average diameter of
32
K. Krasnikov
Fig. 3. The user interface to setup mathematical model
ladle – 36 cm. Experiments are done with an axial disk, placed at 19 cm above bottom. The disk has the following parameters in the experiments: 1. 2. 3. 4.
Initial diameter varies from 20 cm to 30 cm with a step 5 cm. Initial thickness – from 1,5 cm to 2,5 cm by step of 0,5 cm. Air consumption differs from 1,6 to 2,4 l/min by 0,4 l/min. Initial diameter of holes is 2 mm.
Discretization of computational domain is small enough to precisely represent geometry of disk especially holes: 1. Radial axis is divided by 192 intervals. 2. Vertical axis is divided by 288 intervals. 3. Time step is 0,001 s. Overall time of modeling is about 400 s, which is sufficient to mix addition in fluid.
5
Experiment, Results and Discussion
Due complexity of equations system a numerical solution is used. The software implements the model using CSharp programming language. The explicit numerical scheme becomes unstable when the time step is not sufficiently small. To increase it the implicit scheme can be used, particularly for computation of diffuse and viscous terms in equations. Also to verify a numerical solution it is
Numerical Modeling of Disk Dissolution
33
ensured that a mass of addition is always conserved as well as an amount of gas is kept constant when there are no sources and an appropriate boundary condition is set on the top surface of melt. On the Figs. 4, 5 and 6 axial cross-sections illustrate two scalar fields and speed of melt. On the left part of figures an addition concentration is presented and on the right part a gas fraction is demonstrated. A maximum addition concentration is red (volumes initially occupied by disk). Also two holes are clearly seen. A very small amount of gas concentration is a dark blue and large amounts of it are also red. Nearly zero gas fractions are white.
Fig. 4. The half of an axial cross section of the two scalar fields at 32,75 s of model time – on the left maximum addition concentration is red (disk with holes), and on the right – minimum of gas concentration is dark blue. (Color figure online)
Black arrows show directions and intensities of flow. Each arrow represents vector averaged between 16 neighboring speed vectors. While it is no necessity to draw all speed vectors at once, such averaging gives chance to see a scalar field in colors behind the arrows. The disk acts like barrier for fluxes. Therefore in the beginning of modeling (Fig. 4) there are two big vortexes below and above disk. The bottom of the disk is dissolved faster than its top, because fluxes below disk are faster. These fluxes mostly run around the disk. Small fluxes go through holes in disk intensifying its dissolution from inside. A gas floats above disk by three ways – some amount
34
K. Krasnikov
goes through the two holes (the first one receives a bigger part of gas than the second one) and the remaining gas follows fluxes around the disk. Number of holes or their diameter can be increased if time of dissolution needs to be lesser. Figure 5 shows state of fields at approximately middle of numerical experiment. Two big swirls still exist and some fluxes go around the disk. However, holes become large, especially the first one from the axis. That hole enlarges faster because of fluxes, accelerated by a lot of gas, mainly running through it. Above it there is a small vortex, which helps to dissolve the top part of disk. Although a region of the disk, where it is connected to the tube, is the slowest in dissolution. It is clear that in the corners the fluid speed is very low, but after some time a fast flux from underneath will get to that corner too. As seen at the picture similar motionless regions exist at the melt. These regions accumulate additive, which badly influences homogeneity of the melt. The dissolvable disk changes direction of fluxes with time, helping to activate low speed regions, when gas blowing cannot be increased or lowered. Also at the moment, showed on the Fig. 5, a gas almost avoids going around disk and the second hole receives very little amount of gas. The first hole is large enough to let a whole gas go through it making a large curved plume. After that moment the disk considerably begins to disappear. Left part of the Fig. 5 becomes more lighter, because addition concentration increases.
Fig. 5. Additive and gas field at 168,75 s of model time
Numerical Modeling of Disk Dissolution
35
Later, after the disk is completely dissolved, some time is needed to mix the melt with an addition averaging its concentration. Fluid fluxes freely run without going around any obstacle and hydrodynamic picture significantly changes – a large swirl is appeared (Fig. 6) with strong flows near boundaries and low speed region at center of radius. If gas blowing is turned off a temperature of outer layers of the molten steel can reach freezing point and dangerous situation can happened. To avoid such conditions a heating up by electrodes is often used. After numerical modeling, it became clear that two vortexes have almost two times lower speeds than single large vortex has. Some kinetic energy is taken by flowing around disk. This fact positively influences on the ladle lining, which is damaged slower. Also size of slag “eye” at the melt’s surface decreases and the melt better keeps its temperature under layer of the slag.
Fig. 6. Addition and gas field at 400 s of modeling time
Multithreading parallelization gives considerable reduction of computation duration (Fig. 7) on the CPU with eight cores. This is especially helpful when number of experiments is large and duration of each is substantially large. Divergence from ideal dependency on the count of cores and nonlinearity of chart is connected with inevitable performance losses and non 100% parallelization of the program. Even better results can be achieved when number of finite
36
K. Krasnikov
volumes is further increased and more work is done by each thread. However each CPU core has limited amount of cache memory, that’s why arrays of float numbers need to be not too large and to fit cache memory capacity. That’s why the program use single precision float numbers to represent all fields of mathematical model. In considered problem double precision numbers are unnecessary, because numerical error, accumulated during calculation is neglectable small. Also, the program can be modified to use on GPU to reach significant performance acceleration, because modern GPUs able to run tens of threads in parallel and overperform CPUs in tests. Also GPUs has perfect performance in dealing with single precision floats, while double precision slows down it by few times. Additionally a bottleneck can be frequent request of data in GPU memory from CPU. It is desirable to independently run the GPU some interval of time and only then get any results from it.
Fig. 7. Calculation duration dependency on count of used cores of CPU
6
Conclusions
Difference between the model in [1], which also uses level-set method, and the presented model is taking hydrodynamics into account. This is important because the speed around disk influences the rate of melting. If to compare the presented model with the one in [14], then it can be noted that the former takes into account complex body shape and its variability, while the latter is limited to cylindrical shape and cannot be used to predict unsymmetrical dissolution of disk in the present work. Numerical experiment confirms expectations of scientists about good position of disk at a half of vessel. The disk slows down too fast fluxes and its dissolution can be regulated by number and diameter of holes.
Numerical Modeling of Disk Dissolution
37
Comparison of numerical results with laboratory ones gives correspondence of 11–27% for the mentioned conditions. Conclusion based on numerical experiment is the scientists’ proposition of immersed axial disk increases efficiency of considered metallurgical process and it is recommended to use in practice. Author is grateful to the scientists of the metallurgy department of Dniprovskyi state technical university for their physical modeling of carbomide dissolution in a water and hydrodynamics during gas blowing at laboratory. Their detailed description of modeling results allows to validate mathematical model and to recommend its further usage. A full video with evolution of gas and concentration field is published at the website http://www.scitensor.com/ lab/metallurgy/ladle/disk. The further research can be devoted to other configurations of disk including rotation.
References 1. Bondzio, J., Seroussi, H., Morlighem, M.E.A.: Modelling calving front dynamics using a level-set method: application to jakobshavn isbræ. The Cryosphere 10, 497–510 (2016). https://doi.org/10.5194/tc-10-497-2016 2. Dapogny, C., Maitre, E.: An introduction to the level set method. course, universit´e joseph fourier (2019). https://ljk.imag.fr/membres/Charles.Dapogny/cours/ CoursLS.pdf 3. Giga, Y.: Surface evolution equations: a level set approach, p. 231. Hokkaido University (2002). https://doi.org/10.14943/647 4. Gress, A., Soroka, Y.: A device for secondary treatment of metals and alloys (2015). https://base.uipv.org/searchINV/getdocument.php?claimnumber=u201414121& doctype=ou 5. Gress, A., Soroka, Y., Smirnova, O.: Physical modeling of bath hydrodynamics processes in steelmaking ladles equipped with axial filtering-diffusion barrier. In: Proceedings of the 11th International Scientific Conference “Litye-Metallurgiya” in Zaporozhie, Ukraine, pp. 39–40 (2015) 6. Gress, A., Storozhenko, S.: Numerical investigations of melt hydrodynamics in steelmaking ladles equipped with filtering barrier. Math. Model. 28, 84–88 (2013) 7. Gress, A., Storozhenko, S.: Features of hydrodynamics picture in steelmaking ladles equipped with axial filtering-diffusion barrier. In: Proceedings of the 11th International Scientific Conference “Litye-Metallurgiya” in Zaporozhie, Ukraine, pp. 41–42 (2015) 8. Gress, A., Storozhenko, S., Vasik, A.: Numerical modeling of materials melting processes in steelmaking ladle. In: Proceedings of the 9th International Scientific Conference “Litye-Metallurgiya” in Zaporozhie, Ukraine, pp. 45–46 (2013) 9. Lamotte, T.: Is the homogeneity of your dry mix acceptable? (2018). https:// www.chemicalprocessing.com/articles/2018/is-the-homogeneity-of-your-dry-mixacceptable 10. Liu, Yu., Ersson, M., Liu, H., J¨ onsson, P.G., Gan, Y.: A review of physical and numerical approaches for the study of gas stirring in ladle metallurgy. Metall. Mater. Trans. B 50(1), 555–577 (2018). https://doi.org/10.1007/s11663018-1446-x 11. Osher, S., Fedkiw, R.: Level Set Methods and Dynamic Implicit Surfaces. AMS, vol. 153, p. 273. Springer, New York (2003). https://doi.org/10.1007/b98879
38
K. Krasnikov
12. Sigarev, M., Storozhenko, S., Soroka, Y.: Device for the secondary treatment of metals and alloys (2016). https://base.uipv.org/searchINV/getdocument.php? claimnumber=u201601961&doctype=ou 13. Sigarev, N., Soroka, Y., Plakushchyi, D.: Physical modeling liquid metal hydrodynamics in steelmaking ladle during top blowing of bath with usage of filteringdispersing barrier. Coll. Schol. Pap. Dniprovskyi State Tech. Univ. 2, 17–23 (2015) 14. Singh, U., Prasad, A., Kumar, A.: Freezing and melting of a bath material onto a cylindrical solid additive in an agitated bath. J. Min. Metall. 48, 11–23 (2012). https://doi.org/10.2298/JMMB110505010S
Streaming Algorithm to the Decomposition of a Polyatomic Molecules Mass Spectra on the Polychlorinated Biphenyls Molecule Example Serge Olszewski1 , Violetta Demchenko2 , Eva Zaets2 , Volodymyr Lytvynenko3(B) , Irina Lurie3 , Oleg Boskin3 , and Sergiy Gnatyuk4 2
1 Taras Shevchenko National University, Kyiv, Ukraine Kundiiev Institute of Occupational Health of the National Academy of Medical Sciences of Ukraine, Kyiv, Ukraine [email protected] 3 Kherson National Technical University, Kherson, Ukraine 4 National Aviation University, Kyiv, Ukraine [email protected]
Abstract. Mass spectrometry is one of the fundamental analytical techniques of our time. As a rule, the primary processing of mass spectrometric data in modern quadrupole mass spectrometers from leading manufacturers is successfully carried out by both hardware and software. However, the task of extracting mass spectra of complex sources, such as multi-atomic molecules or mixed substances, has no general solution. As comprehensive solutions to the problem of analyzing complex mass spectra, various machine heuristic methods for decomposing the mass spectroscopic signal into higher-level descriptors of the target objects have been developed and continue to be investigated. However, the areas under consideration tend to involve rather complex iterative methods, such as Bayesian networks or optimization problems. These methods are quite effective, but cannot be used in real-time technologies. The paper proposes an example of a relatively simple streaming algorithm that allows one to isolate the same mass spectra components for multiatomic molecules containing similar fragments. Keywords: Mass spectra Datasets · Decomposition
1
· Fr´echet distance · Signal processing ·
Introduction
One of the leading analytical methods today is mass spectrometry. The fundamental problem of the ambiguity of the mass spectroscopic signal has long been solved by using such physical conditions in the ionization chamber under which the occurrence of multivalent ions is unlikely. Additionally, mass spectrometry c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 39–53, 2022. https://doi.org/10.1007/978-3-030-82014-5_3
40
S. Olszewski et al.
is used in combination with chromatographic separation techniques. Such a tandem makes the task of interpreting the mass spectra of complex sources more urgent compared to the task of their reliable registration. Generally, statistical methods of comparison with reference samples are used in the interpretation of mass spectra. In this case, the whole mass spectroscopic signal is used, in which each ionic peak is the descriptor of the molecule. Disadvantages of this approach are related to different conditions of reference sample registration and large dimensionality of the vector of target object descriptors. Thus, one of the directions of solving the problem of interpreting mass spectra can be reduced to the isolation of higher-level descriptors, the dimensionality of which vector can be significantly reduced. It should be emphasized that there is no general solution to the problem of decomposition of mass spectra of complex sources, such as multi-atomic molecules or mixtures of substances. The search is going on for both hardware and software implementation of various approaches. But these approaches are no reliable, effective, and universal results have been obtained so far. As offthe-shelf solutions to the problem of analyzing complex mass spectra, various machine heuristic methods for decomposing such mass spectra have been developed and are being investigated. However, these areas usually involve rather complex iterative methods such as Bayesian networks or optimization problems [10,13]. These methods are quite efficient, but cannot be used in real-time technologies because they require long and unpredictable computation times in the general case. One way to solve this problem could be the creation of a flow algorithm focused on the separation of similar components in the reference and experimental mass spectra, which rely on the structural features of the mass spectroscopic signal [9]. The relevance of creating such an algorithm is related to the fact that its results for different multivolume molecules can be associated with similarly structured molecular fragments, followed by the accumulation of a heuristic database of mass spectrometric patterns of structural components of molecules. Thus the hardware realization of such an algorithm will allow fulfilling accumulation of such database in a mode of real-time that can be considered as one of the components of the process of machine learning of modern massspectrometric devices.
2
Review of Literature
The authors of [13] propose the information-theoretical method of decomposition of experimental mass spectra of gas mixtures taking into account the presence of noise during measurements as a fundamental method of interpretation of signals from a complex source. The authors called the proposed method generalized maximum entropy (GME). The GME approach considers an undefined inverse problem, taking into account the presence of noise in the measured data. For this purpose, the experimental cumulative signal is decomposed into components with variable weight coefficients. By varying the weight coefficients, we search for the maximum joint entropy of the signal decomposition components, taking
Streaming Algorithm to the Decomposition of a PMMS
41
into account the noise probability. This approach provides a reliable estimate of the unknown decomposition patterns and concentrations of the molecules in the mixture. The authors conducted a comparative analysis of GME estimates with those obtained using the Bayesian approach. The GME method is efficient and has a high computational speed. However, inverse problems by their nature require iterative solution algorithms and are poorly suited for real-time signal processing [13]. In [10] a sparse component analysis (SCA) based on a blind decomposition of mixed mass spectra is presented. In this case, the number of samples of mixed-signal is less than the number of decomposition components. Standard solutions of the related blind decomposition problem (BSS) published in the open literature require that the number of samples is greater than or equal to the unknown number of decomposition components. The authors of this paper report the results of a test of two SCA approaches. The first one is based on the minimization of the norm of the decomposition coefficient vector. The optimization problem was solved by linear programming. The second approach is based on solving the minimization problem of the multilayer hierarchical variable. A quadratic non-negative matrix factorization with constraints imposed on the spectral coefficients was considered under such a variable. Unlike many existing blind decomposition methods, SCA does not require a priori information about the number of decomposition components. This information is estimated from the mixed-signal by robust data clustering algorithms along with the decomposition coefficient matrix. However, despite the clear advantage of the SCA approach, the methods used it for solving the optimization problem are also iterative, which narrows the field of application of its hardware implementation. Simplification of algorithms for the decomposition of complex mass spectra can be implemented based on fundamental features of the mass spectrometric signal form. Such algorithms lose out in their ability to be applied to complex signals of a different nature. However, as highly specialized methods, they allow to significantly reduce requirements to computing resources. Thus, at typical electron energies of 50–100 eV used to achieve high ionization efficiency and stability, the analyte molecules can decompose into different ionic fragments leading to the so-called pattern decomposition (CP). Fragmentation is a molecule-specific property that reflects the chemical structure and atomic composition of a molecule. Therefore, it can be used to decompose multicomponent mass spectra. Direct decomposition of mass spectra of mixed gases by sequential subtraction of component contributions from the mixture signal is a widely used approach. But it only works well if there is no hardware noise. A more sophisticated method, least-squares estimation, is not subject to this restriction and can include measurement errors in the analysis. In both cases, it must be assumed that the CP components are accurately known. The shape of CP, however, is not an inherent property of molecules but depends on the specific mass spectrometer and operational parameters, making CP determination non-trivial. Due to the propagation of measured signal errors,
42
S. Olszewski et al.
the above procedure provides only poor and sometimes non-physical conditions (e.g., estimates of negative decomposition coefficients) [9]. To overcome these difficulties and consistently process the measured spectra, the authors of [9] introduced a method based on Bayesian probability theory. In this paper, a decomposition method is proposed in which the mixed signal is decomposed without using any calibration measurements. On the one hand, this method implies a practical application perspective, as it does not require precise CP. One of the directions of increasing the performance of decomposition algorithms is related to decreasing the dimensionality of the vector of descriptors of the target object. It is important to note that dimensionality reduction is in principle a major challenge for multivariate analysis. In [14], the authors presented two methodologies: principle-component analysis (PCA) and partial least squares (PLS), to reduce dimensionality if the independent variables used in the regression are highly correlated. PCA, as a dimensionality reduction methodology, is applied without considering the correlation between the dependent variable and the independent variables, while PLS is applied based on correlation. PCA methodology, the authors refer to uncontrolled dimensionality reduction methodology, while PLS is referred to as a controlled dimensionality reduction methodology. The paper presents implementation algorithms for both methodologies and results of comparative analysis of their performance in multivariate regressions using simulated data. The proposed method of descriptor vector dimension reduction is attractive due to its independence from signal form features. However, there are often cases where fundamental features of the signal form allow the low-dimensional descriptor vector to be formed empirically. The mass spectroscopic signal belongs to this category of signals. In this case, the use of advanced dimension reduction algorithms is inexpedient. Sufficiently versatile approaches based on the decomposition of complex signals into empirical modes can be quite promising for the decomposition of mass spectra. In the late 1990s, Huang introduced an algorithm called Empirical Mode Decomposition (EMD), which is widely used to recursively decompose a signal into different modes of unknown but distinct spectral bands. The EMD method is known for limitations such as sensitivity to noise and sampling steps. These limitations can only be partially removed by using mathematical methods of noise suppression or decomposition methods that are weakly sensitive to noise. Such methods may include empirical wavelets or recursive variation decompositions. These methods decompose the signal into a finite number of components to which frequency-time analysis can be more effectively applied. The authors of [3,4] consider the iterative filtering (IF) approach as an alternative to EMD. They propose an adaptive local iterative filtering (ALIF) method that uses an iterative filtering strategy together with adaptive and data-driven filter length selection to achieve decomposition. Smooth filters with compact support for solutions of Fokker-Planck equations (FP-filters) are used, which can be used in both IF and ALIF methods. These filters satisfy sufficient conditions for
Streaming Algorithm to the Decomposition of a PMMS
43
convergence of the IF algorithm. Numerical examples demonstrating the performance and stability of IF and ALIF methods with FP filters are given. In addition, to have a complete and truly local toolbox for the analysis of nonlinear and no stationary signals, the authors proposed new definitions of instantaneous frequency and phase, which depend solely on the local properties of the signal. Two alternative formulations of the original algorithm were proposed in [4], which allow transforming the iterative filtering method into a streaming algorithm, making it closer to the online algorithm [3,4]. For a long time, there has been a perception in the literature that Fourier methods are poorly suited for the analysis of nonlinear and no stationary data. The authors of [12] propose an original adaptive Fourier decomposition method (FDM) based on Fourier theory. The paper demonstrates its effectiveness for the analysis of nonlinear and nonstationary time series. The proposed FDM method decomposes any data into a small number of internal bandwidth functions in the Fourier domain (FIBFs). FDM is a generalized Fourier expansion with variable amplitude and variable frequency time series by the Fourier method. The idea of a bank of zero-point phase filters for the analysis of multivariate nonlinear and no stationary time series using FDM is proposed. An algorithm for obtaining cutoff frequencies for MFDM is presented. The proposed MFDM algorithm generates a finite number of banded multidimensional MFIBFs. The MFDM method retains some inherent physical properties of multivariate data, such as scale alignment, trend, and instantaneous frequency. The proposed methods provide a frequencytime energy (TFE) distribution that reveals the internal structure of the data. Numerical calculations and simulations have been performed, as well as a comparison with decomposition algorithms on empirical modes. It is shown that the proposed approach is competitive with the EMD method [12]. The authors of [6] propose a no recursive variation decomposition model in which modes are extracted simultaneously. The algorithm determines the ensemble of modes and their corresponding center frequencies. A superposition of these modes reproduces the input signal, with each being smoothed into a baseband after demodulation. The authors demonstrate a strong correlation with noise reduction with Wiener filters. The proposed method is a generalization of the classical Wiener filter to several adaptive bands. The variation model used in the synthesis of the generalized Wiener filter is efficiently optimized using the variable direction multiplier method. Preliminary decomposition results obtained on a range of artificial and real data showed better performance relative to some existing decomposition algorithms and improved robustness to sampling and noise [6]. The proposed method of data stream splitting into several parallel streams certainly reduces signal processing time. However, any variation models fundamentally use iterative algorithms with an unpredictable number of iterations. This fact, despite the improved performance, can be a significant barrier to use in real-time applications. Independent Component Analysis (ICA) is a mixed-signal processing method used to separate different superposition components based on their statistical properties. The proposed method requires several recorders of the same mixed
44
S. Olszewski et al.
signal with different phase shifts. Each of the recorders should record a mixed signal with slightly different weights. This uses blind source separation, regardless of whether a priori information about the number of sources is available. Several efficient algorithms have been developed by several researchers to solve the ICA problem (Bell and Sejnowski 1995; Cardoso and Souloumiac 1996; Hyva linen 1999; Ziehe et al. 2000, Frigyesi et al. 2006; Liebermeister 2002; Scholz et al. 2004) [6]. A significant drawback of the proposed method is the need for multi-channel registration of a mixed-signal with spatially spaced concentrated sources and sensors. For ICA-decomposition of mass spectra, such an approach is not possible in principle, since the source of the mixed-signal in the mass spectrometer is distributed, and placing ion current sensors in the mass analyzer at large distances from each other is structurally difficult and not technically advantageous. As the authors of papers [2,11] state, there are only a few techniques for the analysis of single-channel records, which have certain limitations. Single-channel ICA (SCICA) and wavelet-ICA (WICA) methods are presented as examples. The paper proposes a new original method of decomposition of single-channel signals. This method combines decomposition into empirical modes with ICA. Comparison of the separation performance of the proposed algorithm with SCICA and WICA methods using simulations and showed that the original method outperforms the other two, especially at reduced signal-to-noise ratio (SNR) values [2,11]. As a result of this analysis, we can conclude that research towards methods of decomposition of complex signals from many sources and, in particular, mass spectrometric signals, has mainly focused on recursive algorithms. This approach has significant advantages due to the well-developed mathematical apparatus. However, it is poorly applicable in real-time technologies due to insufficient performance and high resource-intensiveness. In this connection, in our opinion, the direction of development of streaming algorithms oriented to the specifics of the form of specific signals is promising. Mass spectrometric signals, in this sense, are a very good object of application of such algorithms.
3
Problem Statement
The mass spectrum of a mixture of polyatomic molecules has a complex character in itself. It is generally not very selective and does not allow direct identification of the substances in the mixture. This is caused both by the mutual overlap of mass spectra of individual molecules and by the uncertainty of analytical signatures, which are different for different components of the mixture. Hardware-based chromatography-mass spectrometry largely removes this problem. However, the use of individual ionic peaks in the mass spectrum as descriptors of target molecules makes the dimensionality of the descriptor vector too large for current decomposition algorithms. For example, modern quadrupole mass analyzers generate a signal counting several hundred atomic masses in the worst case.
Streaming Algorithm to the Decomposition of a PMMS
45
One of the methods of descriptor vector dimension reduction can be the method of “convolution” of mass spectra, which transforms them into the socalled “group mass spectra” developed in [5]. These group mass spectra reflect the general characteristics of the whole system (or group of compounds), smoothing out the individual differences in mass spectra of the individual elements of the system. The presence of a specific functional or structural group in a molecule determines its affiliation to a corresponding type and simultaneously determines the main directions of decay of this molecule under the action of the electron impact. Differences in the structure of similar molecules, such as changes in the number, length, and site of substituent addition, size, position, and type of cycle junctions, can lead to changes in the masses of corresponding ions and redistribution of the intensity of their peaks without changing the basic directions of decomposition. So, each type of compound is characterized by the presence of certain groups of ions arranged in one or more homological series corresponding to the most probable directions of molecule decay. The main groups of ions that can be distinguished in the mass spectra of complex mixtures are molecular ions. These ions, as well as their further decay products, are grouped in certain regions of the homologous series. The envelope of peak intensities of each such group of ions in the homologous series usually has a maximum that corresponds to the most characteristic ions of the given group. The peak intensity distribution of these ions is determined by the set of individual compounds in the mixture. Each type of compound has a characteristic set of fragments and, consequently, a certain set of characteristic groups of ions. Differences in the structure of fragments of molecules of individual compounds of each type are reflected in a certain redistribution of the integral intensity of peaks of individual groups of ions and changes in the intensity of peaks within groups. The aggregate of the groups of ions that make up the mass spectrum of a mixture, which is characteristic of certain structural fragments of molecules, forms the “group mass spectrum”. The transition from a conventional to a group mass spectrum corresponds to the transition from representing the mass spectrum as a set of ions with certain masses to representing it as a set of certain structural fragments. On the one hand, this representation makes the mass spectrum of a complex mixture similar to that of an individual compound. The only difference is that the elements of the group mass spectrum correspond to the averaged fragments of molecules of individual compounds of the given type. Just as in the case of mass spectra of individual compounds, a set of structural fragments is an identifier of the molecule in question. In addition, each of the ion groups is characterized by a peak intensity distribution whose profile can be regarded as a kind of model of a continuous spectrum similar to absorption optical spectra. Including all the analysis steps typical for such spectra can also be applied for intensity distribution profiles of peaks in the characteristic ion groups of mass spectra of complex mixtures. For example, a complex curve enveloping characteristic groups of ions of several homologous series can be divided into separate components, each of which belongs to a different homologous series. According to [5], each such individual component has a profile close to the Poisson
46
S. Olszewski et al.
distribution with appropriate parameters. Moreover, the parameters of the corresponding distribution are unambiguously associated with a particular homologous series. Thus, the peculiarities of the shape of the group mass spectra of polyatomic molecules are a fundamental property of their sequential shock dissociation process. These features can be used to create stream decomposition algorithms rather than recursive ones. Moreover, it is expedient to use a separate group of peaks rather than a single ion peak as a descriptor of the molecule, taking into account the shape of the envelope. With this approach, the dimensionality of the descriptor vector can be reduced by an order of magnitude. At the same time, the measure of proximity of descriptors can be estimated by the most adequate integral operators, which have the additional property of suppressing the additive noise.
4 4.1
Materials and Methods Structure of Polychlorinated Biphenyls (PCBs) Mass Spectra
In the general case, the mass spectra of polyatomic organic molecules are a multiply connected array of molecular ions. The mass spectra of molecules of polychlorobiphenyl isomers with the same gross formula C12 H7 Cl3 : PCV16, PCV17, PCB18, and PCB22 respectively show in Fig. 1. All the given molecules have the same elemental composition, as well as the same fragments and different fragments. In turn, the mass spectra of these molecules are the same size in terms of the ion mass range and include similar and different ion sequences. Individual fragments of the presented array of ions consist of closely spaced ion peaks. The distance between the values of the atomic mass of ions in such groups, as a rule, does not exceed 2 atomic masses. At the same time, the distance between these dense groups can amount to hundreds of atomic masses. These dense groups are associated with stable fragments of a polyatomic molecule. Their volume and localization are determined by the elemental composition of the molecule. When using classical methods for analyzing the mass spectra of polyatomic molecules, the set of ion peaks belonging to each group is approximated by a continuous curve, for example, the Poisson distribution law. The parameters of such approximations are used as descriptors of the corresponding molecules. The described approach is associated with a decrease in the reliability of the identification of molecules since the parameters of such distributions are sensitive to the conditions for registering mass spectra and the instrumental functions of a particular device. At the same time, the consideration of separately spaced fragments of mass spectra as elementary stable components has proven itself well in identifying molecular structures with similar elemental composition. In the problem of automating the separation of homogeneous ion sequences from the mass spectra of isomers of organic molecules, it is assumed that it is expedient to carry out a complete decomposition of the mass spectra into these
Streaming Algorithm to the Decomposition of a PMMS
47
Fig. 1. Examples of mass spectra of polyatomic organic molecules with a similar structure and the same elemental composition
disjoint groups as an initial step in the streaming algorithm. At the same time, do not replace these groups with approximation by analytical dependences, but use them as they are. At the same time, to compare the forms of these subgroups, use the existing methods for assessing the measure of proximity of the curve shape. That approach will reduce the cost of computing resources and get rid of the influence of the hardware function of the mass spectra recorder. 4.2
Methods of Comparing Curves
The simplest classical method of comparison of two signals is considered to be the calculation of their mutual correlation. This method is good because it is robust to the stochastic component of the experimental signals. However, the disadvantage of this comparison method is its weak sensitivity to permutations of samples of a discrete signal. However, if signals are defined by a small number of samples, which is typical of mass spectra, such permutations cause a significant change in their shape. In this case, the value of the coefficient of mutual correlation of the two curves may show a small change in the measure of similarity with significant distortion of the shape of one of them. If the descriptor of a fragment of a molecule is a single bonded ion group, its shape plays a key role in the identification of this fragment, and the use of the cross-correlation
48
S. Olszewski et al.
coefficient as a measure of curve similarity carries the risk of low reliability of the method. In our opinion, it is more appropriate to use the Fr´echet distance [8] as a measure of curve similarity to compare the waveforms. The algorithms proposed by the authors of [1,7] are the most suitable for use in the flux decomposition algorithm for mass spectra. The choice is because these algorithms are focused on comparing the shape of the curves defined by a discrete set of samples that are not recursive and are optimized by the number of steps. In the task of decomposition of mass spectra, we used the Fr´echet distance calculation algorithm proposed in [7], which is easier to implement in software and has shown good results when tested on test data. 4.3
Proposed Approach
The functional block scheme of the working logic of the streaming algorithm for the decomposition of complex mass spectra is shown in Fig. 2.
Fig. 2. Streaming algorithm to the mass spectra decomposition
At the initial stage, the flux algorithm converts a pair of N -dimensional arrays with the intensity values of the ion peaks of the compared molecules into a pair of N × M -dimensional arrays, where M - the number of individual groups of closely spaced ions. These are molecular ions belonging to the same homologous series. Let us call these groups homologous series (HS) Each M -dimensional
Streaming Algorithm to the Decomposition of a PMMS
49
tuple contains intensity values of ion peaks belonging to one HS. Intensity values of peaks belonging to other HS in this tuple are forcibly replaced by zero values. This step of the decomposition algorithm uses the structural features of the mass spectrometric signal to separate the mass spectra into individual high-level descriptors of molecules. By such high-level destructors, we will understand individual HS. To construct a reduced vector of numerical descriptors, a convolution of each HS with a calibrated reference signal can be calculated or each HS can be approximated by a standard function, such as Poisson’s law, and its parameter is considered to be a value of the viso-level descriptor. However, the task of constructing a vector of high-level descriptors of many atomic molecules based on their mass spectra is beyond the scope of this paper and will be considered in further studies. In the next step, the streaming algorithm calculates the Fr´echet distance between different tuples of two-dimensional arrays corresponding to different molecules. Based on the calculated Frechet distances, a matrix of pairwise matching HS mass spectra of the compared molecules is constructed. HS is considered to be relevant if the distance between them does not exceed 0.05. Based on the correspondence matrix, the algorithm synthesizes identical ion sequences across the spectrum by summing the corresponding HS tuples and calculating the Frechette distance between the resulting sums. If the sums of corresponding HS are close in the Frechette distance sense, such sequences are stored in a separate array called the pattern array. Once the number of HS in at least one of the arrays constructed for the compared molecules is exhausted, the flow-based mass spectra decomposition algorithm completes its work.
5
Experiment
MATLAB R2014a computational environment was used for experimental verification of the algorithm. The choice of the environment was motivated by a large number of built-in solvers for different areas of mathematics. In particular, the mechanism of matrix and statistical calculations in this environment is very effective. The fragment of the m-script implementing the proposed algorithm is shown in Fig. 3. The script reads the mass spectra of two sample molecules contained in the EXCEL file into the variables MS s (N ) – mass spectra of the reference molecule and MS r (N ) – mass spectra of the molecule under study. The dimensions of the arrays for all mass spectra are aligned to the mass range defined by the instrument settings. For this purpose, the intensities of the peaks below the noise level and the missing peaks were forcibly taken equal to zero. In the conducted experiments, the dimensionality of the mass spectra was 550. The function M S Splitter(K) transforms one-dimensional arrays of mass spectra of the reference and investigated molecules into corresponding two-dimensional arrays Frg s (N, M ) and Frg r (N, M ) of their homologous series (HS). The M and K dimensions are determined during the primary analysis and correspond to the
50
S. Olszewski et al.
Fig. 3. Dataset size calculation
amount of HS in the mass spectrum of the researching molecules. In the loop of the “Comparison” script section using the F rechet(K) function, a matching matrix fDst (L × L) is built, where L = min {M, K}. The F rechet(K) function calculates the Fr´echet distance between two curves defined by one-dimensional arrays. The next step of the streaming algorithm is calculated in the loop of the “Patterns synthesis” script section. Here we accumulate in the Pttr (N × J) array a sequence of HS belonging to different initial mass spectra that are close in the sense of the Fr´echet distance (Fig. 3). During the accumulation process, both synthesized sequences are compared by the Fr´echet distance between them. If this distance does not exceed the threshold is considered a pattern and is stored in the such a sequence of 0.05, Ptr d N × J array. The J dimension corresponds to the number of identical ion sequences in the mass spectra of the reference and the investigated molecules.
6
Results and Discussion
As a result of the execution of the script that implements the flow decomposition algorithm, the patterns of mass spectra of the common fragments of two isomers of PCB16 and PCB22 molecules were obtained. Validation of the algorithm was
Streaming Algorithm to the Decomposition of a PMMS
51
performed on the PCB17 molecule. For this purpose, a pattern synthesized from the mass spectra of the original molecules from the mass spectrum of the test molecule was isolated. Figure 4 shows the results of algorithm testing. It shows the mass spectra of the initial isomers - diagrams A, B, the pattern synthesized from these mass spectra - diagrams C and the same pattern isolated from the molecule under test - diagram D.
Fig. 4. Dataset size calculation
It can be seen from the presented graphs that the mass spectra of the tested molecules are generally similar but differ in the HS band in the region of 225 atomic masses. In PCB16 molecule its intensity in its maximum is by an order of magnitude greater than in PCB22 molecule and reaches 70% of the amplitude. At the same time, the shapes of these HSs coincide with the accuracy of the comparison criterion. The algorithm rejected this HS precisely because the general shape of mass spectra of PCB16 and PCB22 molecules is not the same precisely because of it. Measured by built-in platform evaluation of algorithm’s execution time showed that even in heavyweight MATLAB environment total execution time of the algorithm is 8.9 s. At the same time, 7.9 s were spent on reading and write operations from the file system, i.e. on the initial loading of the mass-spectrum
52
S. Olszewski et al.
data. The calculations themselves for mass spectra in the mass range from 0 to 550 amu lasted 0.94 s. This value is acceptable for characteristic times of registration of mass spectra by quadrupole mass analyzers and allows their processing in real-time. It is important to note that in the hardware implementation of the proposed algorithm on modern microcontrollers, this factor can be reduced by two orders of magnitude.
7
Conclusions and Future Work
As a result of this work, a flow algorithm for the decomposition of mass spectra of multi-atomic organic molecules oriented on the structural features of the mass spectrometric signal was created and tested. The developed algorithm makes it possible to distinguish in mass spectra the ion sequences identical for molecules with a similar structure, called molecular fragment patterns. It is shown that the characteristic computation time of the flow algorithm for mass spectra on the order of 500 ion masses, in the MATLAB environment is 0.94 s. which, when performed by microcontrollers, makes it promising for solving the problems of mass spectra analysis directly in the process of their registration. In the future, it is planned to study the validity of the proposed algorithm under the conditions of additive noise of the processed signal. It is also intended to adapt the executable code for use in microcontrollers, for real-time preprocessing of data from mass analyzers.
References 1. Ahn, H.K., Knauer, C., Scherfenberg, M., Schlipf, L., Vigneron, A.: Computing the discrete frechet distance with imprecise input. Int. J. Comput. Geom. Appl. 22, 422–433 (2010) 2. Calhoun, V., Liu, J., Adali, T.: Are view of group ICA for fMRI data and ICA for joint inference of imaging, genetic, and ERP data. Neuroimage 45(1), 163–172 (2009) 3. Cicone, A.: Nonstationary signal decomposition for dummies. Adv. Math. Methods High Perf. Comput. Adv. Mech. Math. 41(3), 69–82 (2019) 4. Cicone, A., Liu, J., Zhou, H.: Adaptive local iterative filtering for signal decomposition and instantaneous frequency analysis. Appl. Comput. Harm. Anal. 41(2), 384–411 (2016) 5. Coombes, K.R., Baggerly, K.A., Morris, J.S.: Pre-processing mass spectrometry data. Fund. Data Min. Genom. Proteomics 4, 79–102 (2007) 6. Dragomiretskiy, K., Zosso, D.: Variational mode decomposition. IEEE Trans. Signal Process. 62(3), 531–544 (2014) 7. Eiter, T., Mannila, H.: Computing Discrete Fr´echet Distance. Technical Report CD-TR 94/64. CD-Laboratory for Expert Systems, TU Vienna, Austria (1994) 8. Har-Peled, S., Raichel, B.: The Fr´echet distance revisited and extended. ACM Trans. Algo. 10(1), 3:1–3:22 (2014)
Streaming Algorithm to the Decomposition of a PMMS
53
9. Kang, H.D., Preuss, R., Schwarz-Selinger, T., Dose, V.: Decomposition of multicomponent mass spectra using Bayesian probability theory. J. Mass Spectro. 37(7), 748–754 (2002) 10. Kopriva, I., Jeric, I.: Multi-component analysis: blind extraction of pure components mass spectra using sparse component analysis. J. Mass Spectro. 44(9), 1378– 1388 (2009) 11. Mijovi´c, B., De Vos, M., Gligorijevi´c, I., Taelman, J., Van Huffel, S.: Source separation from single-channel recordings by combining empirical-mode decomposition and independent component analysis. IEEE Trans. Biomed. Eng. 57(9), 2188–2196 (2010) 12. Singh, P., Joshi, S.D., Patney, R.K., Saha, K.: The fourier decomposition method for nonlinear and non-stationary time series analysis. Proc. Roy. Soc. A Math. Phys. Eng. Sci. 473(2199) (2017) 13. Toussaint, U., Dose, V., Golan, A.: Maximum entropy decomposition of quadrupole mass spectra. J. Vac. Sci. Technol. A Vac. Surf. Films 22(2), 401–406 (2004) 14. Vautard, R., Yiou, P., Ghil, M.: Singular-spectrum analysis: a toolkit for short, noisy chaotic signals. Physica D Nonlinear Phenomena 58(1–4), 95–126 (1992)
Method of Functional-Value Calculations of Complex Systems with Mixed Subsystems Connections Maksym Korobchynskyi(B) , Mykhailo Slonov , Pavlo Krysiak , Myhailo Rudenko , and Oleksandr Maryliv Military-Diplomatic Academy named after Eugene Bereznyak, Kyiv, Ukraine {maks kor,pavlo krysiak,ruminik33}@ukr.net
Abstract. Researched method of functional-value calculations of complex systems with mixed subsystems connections in polynomial approximation of the dependence by their value on the level of functional perfection. The research results allow determining the rational management of the functional perfection of a complex system under the condition of its minimum value and given perfection. A rule has been formulated according to which the value of rationalization of a complex system can be assessed. The order of structural functional-value calculations of a complex system with a mixed subsystems connections based on the method of Lagrange factors and the involvement of an iterative approach in solving equations above the third order. Studies show that the achievement of the set value of the level of perfection should begin by improving the subsystems with a lower value coefficient. Are clarified the approaches to calculations according to the developed algorithms of functional-value modelling of a complex system with mixed subsystems connection. Graphical calculations are useful when you need to visualize the calculation process. The application of the advanced method helps to solve direct and inverse problems of rationalizing the structure of complex systems with mixed subsystems connection. It is aimed at finding the cheapest option for structural and parametric restructuring of the system in order to increase its functional perfection. The use of this method is appropriate for the study of weakly formalized and unformulated complex systems, which is why qualitative analysis of a complex system can be translated into quantitative. Keywords: Complex systems · Perfection of complex system · Functional-value calculations · Approximating functions · Lagrange factors
1
Introduction and Literature Review
Improving the efficiency of complex systems is an important and dynamic component of their development, operation and disposal [7,8]. A common methodological direction of their research is functional-value analysis [1,4,6,9–17]. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 54–68, 2022. https://doi.org/10.1007/978-3-030-82014-5_4
Method of Functional-Value Calculations
55
It aims to achieve the system a given level of functional perfection at minimum value. There are several approaches that are adapted to this analysis. The traditional is semantic approach what is based on statistical analysis of the existence of similar complex systems [9,17]. The prognostic possibilities of this approach are narrowed by the limitations in the use of mathematical apparatus. Expert evaluation methods are common [12,14]. They allow to algorithmizes comparison and decision-making processes. But the relationship between functional compliance and the value of a complex system is not analytical. Successful approaches based on methods of interval probabilistic estimates of efficiency [4,13]. Their improvement is focused on taking into account the level of uncertainty of influential factors by the method of Dempster-Shafer [15]. Unfortunately, in the implementation of such methods there is always with a source of information. Powerful computing networks allow the implementation of neural research algorithms [1–3,6]. However the output of analytical dependence “parameters-value” is not used. A rational method of functional-value analysis of a complex system is the use of approximating relationships between the value of its subsystems and their functional perfection. With this approach, the total value of the system C and the rationalization rule will look [8]: P = P [Ci ] ≥ Ps , (1) n C = i=1 Ci → min where P [Ci ] is the functionality that shows the dependence by the level of functional perfection of the system on the invested funds of Ci in each of its n subsystems. The value of Ps determines a set point of functional perfection of the system. In [10,11,16] the application is proposed the using of polynomial approximating dependences between the value of its subsystems and their functional perfection. This allowed for a parametric rationalization of the level of functional perfection of individual subsystems. The limiting factor was a given level of functional perfection of a complex system. But the developed approach was tested for complex systems only with serial or parallel connection of separate subsystems [11,16]. In [10] complex systems with mixed combinations were considered. The equivalent schemes offered in the work allowed determining the existing distribution of the level of functional perfection between separate subsystems. Method of the use of polynomial dependences in this case were not considered. Identification of such method remains is an unsolved part. The aim of the research is to determine the method of the functional-value calculations of complex systems with mixed subsystems connection in the example of the object monitoring system. To achieve the aim of the study were set the following tasks: – to formulate initial data for calculations of functional-value analysis of complex systems with mixed subsystems connection on the example of research of object monitoring system;
56
M. Korobchynskyi et al.
– to find a distribution between functional probabilities that provides the minimum value of the entire monitoring system; – to find out the peculiarities of application of the developed algorithms of functional-value modelling of a complex system with mixed subsystems connection.
2
Materials and Methods
In [10] the system of object monitoring was considered, Fig. 1. Functionally, it is designed to obtain the required amount of reliable information about the object Ob. The result of research was a rational distribution of probabilities of functional tasks its subsystems. We use the obtained data to determine the method of the calculation using polynomial dependencies.
Fig. 1. Object monitoring system (enterprises, organizations) for four groups of parameters
The functional perfection of system P is characterized by the probability of determining the current state of the object Ob. The given value of probability Ps is provided by timely measurement of four groups of quantities. The measured groups are production indicators X1 (group “quality-volume of production”), indicators of marketing activity X2 (group “advertising and sales”), indicators of administrative and financial activity X3 (group “state reporting-financial work”), indicators moral and psychological relations X4 (group “leadershipperformer relations”). Sources of information sensors Di where considered open publications and speeches Sp (D1 are newspapers and magazines; D2 is advertising, price list; D3 are conferences, meetings, etc.), official documentation OD (D4 are reports; D5 is official correspondence), confidential information CI (D6 are rumors; D7
Method of Functional-Value Calculations
57
are official inquiries; D8 is informal activities). Departments of security SD and marketing M D of the company were integrated signals from sensors on communication channels. The processed and generalized information was provided to the competitor management CM . This scheme makes it difficult to assess the completeness of the display of parameters X2 and X3 by sensor D6, as well as the functional perfection of the information processing units’ SD and M D separately for parameters X1, X2, X3 and X4. There is a need to develop an equivalent scheme. It is shown in Fig. 2. Data for performing numerical calculations on the state of the links (subsystems) of the monitoring system are given in Table 1. Such data are formalized by the involvement of experts and analysis of similar systems.
Fig. 2. Equivalent scheme of the object monitoring system for four groups of parameters
Table 1. Structural compliance of the equivalent scheme of monitoring system Number of links Psen,j
Pproc,k Pgen = PX5 Number of links Asen,j
1
0.85
0.75
0.99
1
10
Aproc,k Agen 30
50
2
0.80
0.80
–
2
10
25
–
3
0.70
0.80
–
3
20
25
–
4
0.75
0.75
–
4
20
25
–
5
0.70
0.90
–
5
20
40
–
6
0.65
0.70
–
6
25
40
–
7
0.80
–
–
7
25
–
–
8
0.75
–
–
8
30
–
–
9
0.60
–
–
9
30
–
–
The probabilities of the task Pj by subsystems are given by parametric dependencies or estimated experimentally. Approximation coefficients Aj is also
58
M. Korobchynskyi et al.
determined experimentally or by the algorithm of priority detection in pairwise comparison. The number of the link in both cases corresponds to its ordinal value in Fig. 2. The adjacency matrix [13] in pairwise comparison will be laid horizontally and vertically from the results of expert comparison Aik and Aki : ⎧ ⎪ k=4 ⎨10, Aik = Aki , Aik = 15, Aik > Aki , → Ai = Aik . (2) ⎪ ⎩ k=1 5, Aik < Aki , The defined levels of perfection of the subsystems of the object monitoring system are given in Table 2. Data in Table 2 prove that the requirements for a given level of functional excellence of the monitoring system of: P
=
5
PXi ≥ 0.85,
(3)
i=1
are not executed. This conclusion is correct if the functional necessity of each of the series-connected subsystems is confirmed. Table 2. Probabilities of determining all measurement groups PX1 PX2 PX3 PX4 PX5 P 0.77 0.72 0.91 0.84 0.99 0.42
The next step of the method will be the determination of a rational variant of structural and parametric restructuring for increase its functional perfection to a given value. A simplified equivalent scheme is constructed for this direction (Fig. 3). In terms of the number and functional orientation of the links, it repeats the equivalent scheme in Fig. 2. But it more clearly expressed the directions of functioning of such links. Analysis of Fig. 3 shows that the functional orientation can be divided into nine groups of links (l = 1, 2, 3, ... 9). The first group is sensors D1 and D2. The second group is sensors D3, D4 and D62 . The third group is sensors D5 and D63 . The fourth group is sensors D7 and D8. The fifth group is the processing unit M D1. The sixth group is the processing unit M D2. The seventh group is processing units’ SD3 and M D3. The eighth group is processing units’ SD4 and M D4. The ninth group is the link of generalization of CM . Within each group, all units perform the same functions. Therefore, we can assume that such links are described within one l-th group by the same functions C(Pl ). The level of their functional perfection within the l-th group is also the same and equal to Pl .
Method of Functional-Value Calculations
59
Fig. 3. Simplified equivalent scheme of object monitoring system (enterprises, organizations)
Initial data for the functional-value study of a complex system are two components. The first is the structure and analysis of structural and equivalent schemes of a complex system. The second component will be to find out the levels of perfection of its subsystems. This will allow you to proceed to the compilation of calculation algorithms. Their task is to find the distribution between the functional probabilities Pl , l = (1, 9), which will ensure the minimum value of the entire monitoring system. According to (1–2) the value of the whole system C is additive to its components. With the polynomial approximation of the value of subsystems and the simplified equivalent scheme on Fig. 3 it will be equal to: ⎧ 9 Pl ⎪ C = l=1 (Al Sl )=Y +Z +Q ⎪ ⎪ ⎪ 1 − Pl ⎪ ⎪ P1 P2 P3 ⎪ ⎪ ⎨Z = 2A1 + 3A2 + 2A3 1 − P1 1 − P2 1 − P3 , (4) P4 P5 P6 ⎪ ⎪Y = 2A4 + A + A ⎪ 5 6 ⎪ 1 − P4 1 − P5 1 − P6 ⎪ ⎪ ⎪ ⎪ ⎩Q = 2A7 P7 + 2A8 P8 + A9 P9 1 − P7 1 − P8 1 − P9 where Sl is the number of identical links in the l-th group. Data on Sl values are summarized in Table 3 as a result of the simplified equivalent scheme, Fig. 3.
60
M. Korobchynskyi et al. Table 3. Distribution of identical links by separate groups of the system S1 S2 S3 S4 S5 S6 S7 S8 S9 2
3
2
2
1
1
2
2
1
The second equation will be (3). According to it and the ratio for series and parallel connection of links [10] we obtain the following equation: ⎧
9 ⎪ ⎨P = l=1 PGl = RT P5 P6 P9 = 0.85 2 3 2 (5) R = [1 − (1 − P1 ) ][1 − (1 − P2 ) ][1 − (1 − P3 ) ] , ⎪ ⎩ 2 2 2 T = [1 − (1 − P4 ) ][1 − (1 − P7 ) ][1 − (1 − P8 ) ] where PGl is the functional perfection of the l-th group. The values of individual probabilities Pl were determined by the rule of parallel connected links according to their number in the group Sl . To (4, 5) we used the assumption of similarity of value Cl and probabilistic Pl indicators within each group. This assumption simplifies the calculations, but does not change the general approach to the solution. For example, with different levels of functional perfection of sensors D1 (level of functional perfection P11 ) and D1 (level of functional perfection P12 ) the level of functional perfection of the first group PGl will be equal to: PGl = 1 − (1 − P11 )(1 − P12 ), but not
(6)
2
PGl = 1 − (1 − P1 ) ,
(7)
as written in (5). Instead of one unknown P1 it is necessary to consider two unknowns: P11 and P12 . This will increase by one unit the number of equations in the system of equations of functional-value rationalization. For relations (4, 5) the equations of value rationalization and approaches to their solution in mixed combinations of subsystems can be involved [10]. In this case, the Lagrange function f (Pl , λ) is constructed and local extreme are searched [5]. In general, the following derivatives will look like this: ∂C 1 ∂Pl = Al Sl (1−Pl )2 . (8) S ∂P Sl (1−Pl ) l −1 9 l=1 Pl ∂Pl = (1−P )Sl l
Thus, it is possible to write an expression for the derivative of the Lagrange function and equalize it for all variables Pl to zero: S −1 9 ∂C ∂P 1 ∂f (Pl , λ) Sl (1 − Pl ) l = +λ = Al Sl + λ Pl = 0. (9) ∂Pl ∂Pl ∂Pl (1 − Pl )2 1 − (1 − Pl )Sl l=1
According to (10), for each l = (1, 9) derivative of the total value C and total probability P is equal to a constant value of λ. Then instead of one
Method of Functional-Value Calculations
61
Eq. (10) can be written 9 similar equations. In each of them the left part is equal to λ. The first equation (l = 1) is consistently equal to all other equations (l = 2, 3, ..., 9). We get eight equations that contain nine unknown variables. The ninth equation of this system will be (5). We have nine equations and nine unknowns. This system of equations allows finding an effective distribution of the level of perfection of the monitoring system. The generalized notation of such a system of equations has the following form: ⎧ 1 1 Al (1−P A1 2 2 ⎪ 1) l) ⎨ (1−P = , l = 2, 3, 4, ..., 9 S S −1 (1−P1 ) 1 (1−Pl ) l −1 S . (10) 1−(1−P1 ) 1 1−(1−Pl )Sl ⎪ ⎩P = 9 P = 9 [1 − (1 − P )Sl ] = 0.85 Gl l l=1 l=1 Equation (11) taking into account the data from Table 1 and Table 3, so will be:
⎧ 10 20 2 (1−P1 )2 ⎪ 2) ⎪ = (1−P 1−P1 ⎪ (1−P2 )2 ⎪ 2 ⎪ 1−(1−P1 ) 1−(1−P2 )3 ⎪ ⎪ 10 25 ⎪ ⎪ (1−P1 )2 (1−P3 )2 ⎪ = ⎪ 1−P1 1−P3 ⎪ ⎪ 1−(1−P3 )2 ⎨ 1−(1−P1 )2 ... 10 50 ⎪ ⎪ (1−P1 )2 (1−P9 )2 ⎪ ⎪ = 1 ⎪ 1−P1 ⎪ 1−(1−P9 ) ⎪ 1−(1−P1 )2 ⎪
⎪ 9 2 3 2 ⎪ ⎪ P = l=1 PGl = [1 − (1 − P1 ) ][1 − (1 − P2 ) ][1 − (1 − P3 ) ] ⎪ ⎪ ⎩ 2 2 2 ×[1 − (1 − P4 ) ][1 − (1 − P7 ) ][1 − (1 − P8 ) ]P5 P6 P9 = 0.85
.
(11)
It is algebraic and the complexity of its solution is entirely related to the high order (ninth degree) of the final equation.
3
Experiment, Results and Discussion
Here are three approaches to calculations. Each of them has its advantages and disadvantages. The first approach is based on the use of numerical methods of solution. The calculations are iterative and deliberately approximate. Their sequence will be. 1. Assuming the same contribution of each group of the system to the level of its perfection, the first approximate value of the level of perfection of each group PGl (1) can be calculated by equation (if the starting parameter is the level of perfection of the first group PGl (1) ): √ 9 (12) PGl (1) = PG1 (1) = 9 Ps = 0.85 = 0.982. Then the level of perfection P1 (1) of sensors D1 and D2 will be equal: √ (1) P1 = 1 − 1 − PG1 (1) = 1 − 1 − 0.982 = 0.866.
(13)
62
M. Korobchynskyi et al.
2. The second step is to calculate the equations of the system (12) of all other probabilities Pl , l = 2, 3, ..., 9. The solution of each individual equation is convenient to perform graphical or machine method. The results of the calculations are summarized in Table 4. Table 4. Estimated values of system level perfection (1)
P1
(1)
P2
(1)
P3
(1)
P4
(1)
(1)
P5
P6
(1)
P7
(1)
P8
(1)
P9
(1)
Ps
0.87 0.74 0.83 0.81 0.93 0.95 0.83 0.79 0.91 0.68
3. Using (5) and substituting the calculated probabilities Pl is checked the con(1) dition of exceeding the level of perfection of the system. If Ps does not correspond to the given value Ps , it is necessary to find the second approx(2) imation Ps . To do this, we must calculate from the beginning the second (2) (1) probability approximation P1 , using the following rule: if Ps > Ps , then (2) (1) (1) (2) (1) P1 > P1 ; if Ps < Ps , then P1 < P1 . (2) (1) The increment of the variable P1 (the value of = (P1 − P1 )) can be chosen anyone, but without deducing the value of PG1 from the range of its allowable values (Ps , 1). 4. The obtained data show that the necessary condition (3) is not fulfilled, so (2) we choose a new approximation (for example, P1 = 0,95) and repeat the steps in paragraphs 2–3. The calculations are summarized in Table 5, which is similar to Table 4.
Table 5. Approximate values of system level perfection (2)
P1
(2)
P2
(2)
P3
(2)
P4
(2)
(2)
P5
P6
(2)
P7
(2)
P8
(2)
P9
(2)
Ps
0.95 0.88 0.93 0.94 0.98 0.98 0.93 0.92 0.97 0.89
The approximate value of Ps (2) significantly exceeds the set value Ps = 0,85. Obtained value of Ps is between the first Ps (1) and the second Ps (2) approximations. In this case, the following value of the probability approximation P1 (3) is selected by linear interpolation: (2)
(3)
P1 (1)
=
αP1 (2)
(1)
+ βP1 , α+β
(14)
where α = |Ps − Ps |; β = |Ps − Ps |. (2) In the example α = 0.85 − 0.68 = 0.17; β = 0.89 − 0.85 = 0.04; αP1 = (1) 0.17 · 0.95 = 0.16 and βP1 = 0.04 · 0.866 = 0.035. Thus, P1 (3) = 0,93. The
Method of Functional-Value Calculations
63
Table 6. The second approximate values of system level perfection (3)
P1
(3)
P2
(3)
P3
(3)
P4
(3)
P5
(3)
P6
(3)
P7
(3)
P8
(3)
P9
(3)
Ps
0.93 0.83 0.91 0.90 0.97 0.97 0.91 0.89 0.96 0.86
developed computational procedure is repeated for the selected value of P1 (3) . Data are recorded in Table 6. The value of Ps (3) = 0,86 can be considered a fairly accurate approximation to Ps = 0,85. The accuracy of calculations will be 1%. If such accuracy is insufficient, it is necessary to use again (13–15) and computational procedures for calculating Ps (k) , where k is the ordinal number of the approximation. Note that the Table 4, Table 5, Table 6 can and should be combined into one. Its recommended form is given in the example of Table 7. Analysis of Table 7 allows us to track the trends of changes in the probability distribution Pl (k) as a function of the value coefficients Al and the number of parallel connected Sl units in each group. It can be argued that the achievement of the set value Ps is carried out by improving the subsystems with a lower value coefficient Al and with a greater degree of parallelism Sl . Table 7. Summary values of level perfection (k)
l
1
2
3
4
5
6
7
8
9
Ps
Al
10
20
25
30
30
25
25
40
50
–
Sl
2
3
2
2
1
1
2
2
1
–
(1) Pl (2) Pl (3) Pl
0.86 0.74 0.83 0.81 0.93 0.95 0.83 0.79 0.90 0.68 0.95 0.88 0.93 0.94 0.97 0.98 0.93 0.92 0.97 0.89 0.93 0.83 0.91 0.90 0.97 0.97 0.91 0.89 0.96 0.86
The second approach is to performed iterative calculations which can be assigned to a PC and standard software. This editor allows working with systems of equations containing up to 50 variables. Accordingly, the maximum number of equations should not exceed 50. The differences in the results of calculations between the iterative approach and in the Mathcad environment are explained by the different accuracy of the methods of obtaining them. All general regularities coincide: for parallel links (X1–X4, X7, X8) the requirements for perfection are lower than those connected in series (X5, X6, X9); the most valuable links may have the lowest perfection (X9 < X5). The advantages of calculations using the Mathcad software environment are increased accuracy of calculations and less time spent on obtaining results. Its disadvantages are less “physical” load of results, the need skills to work with software.
64
M. Korobchynskyi et al.
The third approach is a graphical method of calculations. First, we note that the eight equations of system (12) consist of similar algebraic dependences: ⎧ 1 3 ⎪ ⎨x = Dl x+1 , Sl = 3 1 (15) , Sl = 2 . x2 = Dl x+1 ⎪ ⎩ 2 1−x x = D l , Sl = 1 where Dl = f (Pl (k) , A1 , Al ) is a constant coefficient, xl = 1...Pl . Equations (16) have right φr (x) and left φl (x) parts. A graphical dependence y = φr,l (x) can be constructed on each of them. For eight equations the abscissas of the points of intersection of both graphs in the quadrant with (x, y > 0). This corresponds to the physically existing probability values, will be the result of finding the values xr,l . The solution by the graphical method is facilitated by the fact that the left part of the equations for Sl = 1, 2 coincides. Seven of the eight left parts are described by the same expression y = x2 ; eighth dependence: y = x3 . The ninth (k) equation of system (12) is used only to check the accuracy of selection P1 . In Fig. 4 shows graphs that can be used for functional-value calculations of the system (12). The third and seventh, fourth and eighth equations coincide in the form of representation, so the graphs in the Fig. 4 are only seven. Seven answers are sought. The third and seventh, fourth and eighth solutions coincide, which confirms the results presented in Table 7.
Fig. 4. Graphical representation of the approach to the implementation of functionalvalue calculations
Method of Functional-Value Calculations
65
The presentation of the results is not to compensate for the disadvantages of exaggerating the graphical dependencies on a single graph. In addition, an important component of the graphical approach is the choice of the most appropriate scale of the image, while changing the functional imperfection (1−Pj ) from 0.05 to 0.185. It means that in 37 times cause’s difficulties in leaving the results of calculations. The developed method of functional-value modelling of a complex system with mixed subsystems connection allows finding out the directions and sequence of their improvement under the condition of the minimum value of the whole system. The calculations allow drawing conclusions about the a priori database, according to which the functional-value rationalization of the system is appropriate. The initial data can be considered the following: – a functional affiliation and tasks solved by a complex system (the mandatory functional composition of the system, which determines the number of required functional groups, units); – a given level of functional perfection of the system (a given value of the probability task execution Ps by system); – possibilities for creating parallel operating parts of the system (determining the values of Sl ); – value regularities of creation of each link of system (the chosen types of approximating dependences and data concerning numerical values of value coefficients Al and Bl ); – the real capabilities of the parts of the system to perform their functions, which are characterized by the probabilities of the task (the presence of parametric dependencies). The reduction of the required a priori information can be achieved by analyzing such components: 1. The structural composition of the system, which is determined by the number of l required functional groups of subsystems (units) of a complex system. There are two options. First, an assessment of the perfection the structure of the existing system can be investigated. This study will identify areas for improvement of the system. The initial state of the system in this case is analysed by existing functional groups and their number. Secondly, the rationality of developing a new system can be investigated. In this case, only the functional load of the functional groups is required. 2. Given the level of functional suitability of the system. In both of the above cases, a level of 0.85 for a given value of perfection is appropriate when performing calculations. As a result of rationalization we get a stable distribution of functional perfection of individual parts of the system. If at one value Ps the i-th link has a certain place in the value of Pi among all the links of the system, then when the value of Ps changes its place is preserved. Accordingly (5) it is necessary to begin development of recommendations on achievement of level Ps from a subsystem with the smallest value of Pi . This statement is supported by the results of iterative calculations.
66
M. Korobchynskyi et al.
3. Opportunities to create parallel parts of the system. The need to address this issue may arise only after preliminary calculations for the value rationalization of the system. Preliminary information in this case is redundant. 4. Value patterns of creation of each link of the system. As already noted, this requirement is also superfluous for preliminary calculations. The exact information about such dependence is compensated by available approximating dependences. The coefficients included in such dependences are chosen using the method of pairwise comparisons. 5. The real capabilities of the system to perform their functions. In this case, it is important to determine the main parameters of the link, which will affect the value of such probability. If this list is known, it is enough to choose an approximate expression. Provide that it is limited to the maximum (1) and minimum (0) values and a gradual transition between them. Carrying out any calculations increases the value of work and the system. Therefore, there is a question about the feasibility of value rationalization complex system. There is no unambiguous answer at this question. For a simple system (two or three subsystems) value patterns can be obvious. With the increase of the system (both number of components and number and types of connections) such patterns become more complicated and ambiguous. Thus, with the complication of the system, the feasibility of preliminary value calculations increases. The same can be said about more expensive systems. If the system is more expensive, than the lower the relative value of value rationalization calculations and the more appropriate it is. It is possible to offer a quantitative assessment of the feasibility of value rationalization. It will be a posteriori and will provide such an assessment only for calculations already performed. The cost of the system without value calculations is Bval . The cost of the system with value calculations is Bval1 . It can be argued that the following equation would be true: Bval1 = Bval − B + Bper ,
(16)
where B is the share of the cost of the system, which is saved as a result of value calculations and have their own personal value Bper . If Bper < B value calculations are justified at Bper ≥ B, than value calculations were superfluous, as they minimized the loss of time for implementation. If we add to the Eq. (17) the obligatory condition of functional perfection: P ≥ Pval ,
(17)
it will be necessary to compare the achieved level P,r with the previous value of P . The requirement must be met P,r > P . Then value calculations may be appropriate only because it increases P .
4
Conclusions
The research method of functional-value calculations of complex systems with mixed subsystems connection on the example of object monitoring system is
Method of Functional-Value Calculations
67
investigated. The results allow determining the rational management of the functional perfection of a complex system under the condition of its minimum value and given perfection. Was formulated a rule, which can assess the feasibility of value rationalization of a complex system. The initial data for the calculation of functional-value analysis of complex systems with mixed subsystems connection are determined on the example of the study of the object monitoring system. This allowed to justify ways to reduce the required a priori information when applying the method. The order of structural functional-value calculations of a complex system with mixed subsystems connection based on the method of Lagrange multipliers and iterative approach in solving equations above the third order. The rational distribution between the functional probabilities of the subsystems of the object monitoring system is substantiated. Studies show that it is necessary to start by improving the subsystems with a lower value ratio and a greater degree of parallelism. Are clarified the approaches to calculations according to the developed method of functional-value modelling of a complex system with a mixed combination of subsystems. Compilation of tables with the results of calculations using an iterative approach is convenient for analysing trends in the influence of subsystem parameters on the properties of a complex system. Using of standard computing programs is advisable in improving existing complex systems. Graphical calculations are useful when you need to visualize the calculation process.
References 1. Babichev, S., Korobchynskyi, M., Lahodynskyi, O., Korchomnyi, O., Basanets, V., Borynskyi, V.: Development of a technique for the reconstruction and validation of gene network models based on gene expression profiles. Eastern-Eur. J. Enterprise Technol. 4(91), 19–32 (2018). https://doi.org/10.15587/1729-4061.2018.123634 ˇ 2. Babichev, S., Skvor, J., Fiˇser, J., Lytvynenko, V.: Technology of gene expression profiles filtering based on wavelet analysis. Int. J. Intell. Syst. Appl. 16(4), 1–7 (2018). https://doi.org/10.5815/ijisa.2018.04.01 3. Babichev, S., Taif, M., Lytvynenko, V.: Inductive model of data clustering based on the agglomerative hierarchical algorithm. In: Proceedings of the 2016 IEEE 1st International Conference on Data Stream Mining and Processing, DSMP 2016, pp. 19–22 (2016). https://doi.org/10.1109/DSMP.2016.7583499 4. Beavers, D., Stamey, J.: Bayesian sample size determination for value-effectiveness studies with censored data. PLoS ONE 13(1), art. no. e0190422 (2018). https:// doi.org/10.1371/journal.pone.0190422 5. Bertsekas, D.: Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Athena (1982) 6. Elrehim, M., Eid, M., Sayed, M.: Structural optimization of concrete arch bridges using genetic algorithms. Ain Shams Eng. J. 10(3), 507–516 (2019) 7. Glushkov, V.: Introduction to cybernetics. USSR Academy of Sciences, Kyiv (1964) 8. Good, H., Machol, R.: System Engineering. An introduction to the design of largescale systems. McGraw-Hill Book Company, New York (1957)
68
M. Korobchynskyi et al.
9. Jaehyun, L., Jinko, K., Woong, K.: Day-ahead electric load forecasting for the residential building with a small-size dataset based on a self-organizing map and a stacking ensemble learning method. Appl. Sci. 9(6), 12–31 (2019). https://doi. org/10.3390/app9061231 10. Korobchynskyi, M., Slonov, M., Rudenko, M., Maryliv, O.: Method of structural functional-value modeling of a complex hierarchic system. Lecture Notes in Computational Intelligence and Decision Making 1246, 213–231 (2020). https://doi. org/10.1007/978-3-030-54215-3 14 11. Korobchynskyi, M., Slonov, M., Tsybulskyi, S., Dereko, V., Maryliv, O.: Construction of a method for the structural functional-cost modeling of a complex hierarchic system. Eastern-Eur. J. Enterprise Technol. 4(103), 11–22 (2020). https://doi.org/ 10.15587/1729-4061.2020.195379 12. Korshevniuk, L.: System analysis: evolution and perspectives of future development. Syst. Res. Inf. Technol. 2, 7–26 (2020). https://doi.org/10.20535/SRIT.23088893.2020.2.01 13. Liu, R., Wu, Z.: Well-posedness of a class of two-point boundary value problems associated with ordinary differential equations. Adv. Difference Equ. 2018(1), 1–12 (2018). https://doi.org/10.1186/s13662-018-1510-5 14. Saaty, T.: The us-opecenergy conflict: the payoff matrix by the analytic hierarchy process. Internat. J. Game Theory 8(4), 225–234 (1979) 15. Sentz, K., Ferson, S.: Combination of Evidence in Dempster-Shafer Theory: Sandia National Laboratories. Book January, New York (2002). https://doi.org/10.2172/ 800792 16. Slonov, M.: Functional-cost approach to structural improvement of complex system: educational process. Notes V.I. Vernadsky Taurida National Univ. 30(69), 124–128 (2019). https://doi.org/10.32838/2663-5941/2019.4-1/22 17. Syrotenko, A., Shypanskyi, P., Pavlikovskyi, A., Lobko, M.: Current problems of planning of defense in ukraine: a comprehensive approach. Nauka i oborona 1, 3–13 (2020)
Current State of Methods, Models, and Information Technologies of Genes Expression Profiling Extraction: A Review Lyudmyla Yasinska-Damri1 , Ihor Liakh4 , Sergii Babichev2,3(B) , and Bohdan Durnyak1 1
2
Ukrainian Academy of Printing, Lviv, Ukraine [email protected], [email protected] ´ ı nad Labem, Jan Evangelista Purkynˇe University in Ust´ ´ Ust´ı nad Labem, Czech Republic 3 Kherson State University, Kherson, Ukraine [email protected], [email protected] 4 Uzhhorod National University, Uzhhorod, Ukraine [email protected]
Abstract. An application of both the DNA microchips tests or RNA molecules sequencing experiments allows us to form the high-dimensional matrix of genes expressions, values of which are proportional to the number of the appropriate type of genes that matched the respectively investigated sample. In the general instance, the number of genes can achieve tens of thousands of ones. This fact calls the necessity to extract the genes which are able to recognize the examined samples with a high resolvable level. In this review, we analyze the current state of works focused on genes expression profiling extraction based on the application of both single methods and hybrid models. The conducted analysis has allowed us to allocate the advantages and shortcomings of the existing techniques and form the tasks which should be solved in this subject area to improve the objectivity of gene expression profiling extraction considering the type of the investigated samples. Keywords: Gene expression profiling extraction · Hybrid models DNA microarray · RNA molecules sequencing · Clustering · Classification
1
·
Introduction
One of the current areas of bioinformatics is reverse engineering or gene regulatory network reconstruction using data of genes expression profiling obtained based on the use of both DNA microarray tests or RNA molecules sequencing experiments. In both cases, the experimental data are represented as a high c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 69–81, 2022. https://doi.org/10.1007/978-3-030-82014-5_5
70
L. Yasinska-Damri et al.
dimensional matrix of genes expression where rows and columns are the genes and investigated samples respectively. The complexity of solving the reverse engineering problem based on the full dataset is determined by a huge number of genes. So, the analysis of the experimental data has shown [1] that the human’s genome contains approximately 25000 active genes. The same number of genes are in the inactive state (zero expression). Which genes are currently active depends on the characteristics of the biological organism and its current state. After deleting genes with a low expression value, the number of genes decreases to approximately 10000 ones. The next step is to remove non-informative or select the most informative genes expression profiling which allow identifying the objects with the best possible resolution. Currently, there are two groups of methods that allow us, based on statistical analysis, to select mutually correlated profiles of gene expression that can distinguish classes of investigated objects: clustering and biclustering. The clustering technique allows dividing the samples or genes into clusters depending on the level of their proximity according to the used metric. The result of the biclustering technique implementation is clusters of mutually correlated genes expression profiling and samples, and there may be cases when the same gene and/or sample belong to different clusters. Despite the fact that clustering and biclustering techniques allow identifying mutually correlated gene expression profiling or/and samples, but their application is accompanied by a high percentage of subjectivity due to the imperfection of the relevant quality criteria. Moreover, the successful implementation of these techniques needs verification of the appropriate model based on the use of existing data mining or/and machine learning techniques. In this instance, it is reasonable to use an ensemble of methods of both data mining and machine learning [18,23,29–33,37,40] to identify informative profiles of gene expression. The pre-processing of gene expression experimental data to extract genes that allow us to distinguish the investigated samples with the maximal resolution is a very important step. The successful solution of this problem promotes to improve both the reconstructed gene regulatory network and diseases diagnostic system. This fact indicates the actuality of the research in this subject area.
2
Materials and Methods
Recently, a large number of scientific works have been devoted to the problem of assessing the level of gene expression profiles informativeness in order to extract the most informative ones taking into account the state of the investigated samples. So, [44] presents the results of the research concerning the detection of genes expression profiling of miRNA molecules. Differentiated miRNAs were screened by pairwise comparison. During the simulation process, an average of 2744989 raw reads out of 9888123 was allocated from each library. A total of 2565 siRNAs were detected. In [34], the authors presented a comparative analysis of various methods of gene expression data classification to select the most informative profiles of gene expression using errors of both the first and second kinds. The principal disadvantage of this approach is the following: in real conditions, we
Genes Expression Profiling Extraction: A Review
71
can not know exactly the class to which a gene belongs. In other words, this technique has a high percentage of subjectivity. In the review [8], the authors conducted a comparative analysis of current hybrid techniques for extracting informative genes expression profiling to solve the problem of further classification of objects, which was studied under different types of cancer. The analysis of efficiency of various models of genes expression profiling processing from filtration by an estimation of the corresponding gene informativeness level using methods of the statistical analysis to complex application of both the cluster analysis and data classification techniques was carried out. The main criterion in all cases has applied the criteria for assessing the classifier accuracy. Various combinations of data mining and machine learning methods were analyzed in this review. Hybrid methods for extracting informative genes based on the use of a genetic algorithm are presented in [17,19,26,28,38]. So, [28] presents the results of the research about the development of a hybrid algorithm for the selection of features based on the complex application of the method of mutual information maximization (MIM) and adaptive genetic algorithm (AGA). The proposed hybrid method was called MIMAGA-Selection. In beginning, genes expression profiling with a high cross-correlation value were selected using the MIM method. The maximum number of genes was 300 in this case. Then, the adaptive genetic algorithm AGA was applied to the selected genes. Six sets of gene expression data tested on the expression of multi- and binary cancer were used during the simulation process. The classification of the data was performed by a neural network of direct propagation (MEN - machine of extreme learning). The simulation procedure has assumed 30 repetitions of the classification process. To compare the accuracy of the MIMAGA-Selection algorithm, the authors have tested using the same datasets with the same number of target genes three algorithms: sequential direct selection (SFS), ReliefF and MIM with the ElM classifier. The simulation results showed higher efficiency of the proposed by the authors MIMAGA method in comparison with other applied methods. In addition, the authors have classified genes using the MIMAGA-Selection method using four different classifiers; backpropagation neural network (BP), support vector machine (SVM), ELM and regularized extreme learning machine (RELM). All classifiers have achieved accuracy more than 80%. The paper [26] proposed a hybrid model for selecting gene expression profiles based on the complex application of a genetic algorithm with the establishment of dynamic parameters (GADP) and a statistical homogeneity X2-test. This approach allows automating the selection of the genes expression profiling without human intervention. The evaluation of the efficiency of the proposed technique was performed by applying the SVM classifier. Six sets of cancer data were used to evaluate the proposed method effectiveness in comparison with other existing methods. The results of the simulation have shown the high accuracy of the proposed model. For some data, the accuracy achieved 100%. However, the principal shortcoming of the proposed technique very small quantity of the used genes used during the classification procedure.
72
L. Yasinska-Damri et al.
A hybrid model of genes expression profiling selection based on the complex use of correlation analysis (Correlation-based Feature Selection (CFS)) and genetic algorithm (Taguchi-Genetic Algorithm (TGA)) has been proposed in [17]. The implementation of the genes extraction procedure involved two stages. In beginning, the genes expression profiling with a small cross-correlation value in comparison with other genes were deleted using the CFS method. Then, the TGA algorithm with the corresponding functions, which were formed at the filtration stage, was applied to the remaining genes. The k-nearest neighbor classifier (KNN) was used to evaluate the effectiveness of the proposed method in terms of classification accuracy. Eleven sets of gene expression data examined on cancer were tested. The simulation results showed the high efficiency of the proposed by the authors approach, since high accuracy of classification was achieved in 10 data sets out of 11, while for six data sets the accuracy reached 100%. In [19], the authors present the results of the research concerning the development of a hybrid model of genes expression profiling selection from data obtained by DNA microarray experiment, based on the complex application of the genetic algorithm GA and the method of intelligent dynamic genetic algorithm (Intelligent Dynamic Genetic Algorithm (IDGA)). The implementation of the model involved two stages. In the first stage, Laplacian and Fisher filtration methods were used to select the 500 best genes independently of each other. In the second stage, the IDGA method was used. Five data sets of cancer patients were used to evaluate the effectiveness of the proposed model. The support vector machine (SVM), the naive Bayesian classifier (NBY), and the k-nearest neighbor (KNN) method were used as the classifiers within the framework of this model. The IDGA algorithm has been applied seven times. Analysis of the simulation results has shown that the proposed by the authors method allowed them to obtain 100% accuracy on four data sets. Moreover, the use of IDGA with the Fisher filtering method allowed to obtain better results in comparison with the Laplacian filtering method on four data sets. A hybrid technique for selecting genes expression profiling based on the complex application of Information Gain (IG) and the standard genetic algorithm (SGA) has been proposed in [38]. This method is called IG/SGA. In beginning, the IG method of information increment was used to reduce the genes. Then, a genetic algorithm was applied to the obtained data. The classification procedure was performed using the method of genetic programming GP. Seven data sets of microchips from cancer patients were used during the simulation process. The results showed that the use of the proposed technique allowed authors to achieve 100% accuracy of classification for two data sets. Hybrid models based on the application of the Ant Colony Optimization algorithm (ACO) are considered in [42,43]. The technique presented in [42] combines cellular learning automats and ant colony optimization (CLA-ACO). The practical implementation of this model involves three stages. In the first step, the data is filtered by the Fisher method. In the second stage, a subset of genes expression profiling with an optimal set of functions was formed by the complex application of cellular learning automats and the ACO algorithm. The third stage assumes the final formation of a subset of gene expression profiles based on the calcu-
Genes Expression Profiling Extraction: A Review
73
lation and analysis of errors of the first and second kind using ROC analysis. The evaluation of the efficiency of the proposed model was carried out in two stages. In beginning, statistical analysis of genes expression profiling was performed using different methods, followed by ranking of data and selection of the most informative ones. Then, the CLA-ACO method was applied to the data, followed by the calculation of quantitative quality criteria of the corresponding model. In [43], a hybrid model for selecting genes expression profiling based on the complex application of ACO algorithms and an Adaptive Stem Cell Optimization model (ASCO) was proposed. The application of this model involved filtering gene expression profiles based on the level of their cross-correlation. The efficiency of the model was evaluated by using various classifiers. Gene expression data obtained by DNA microarray experiments were selected for testing. The test results showed the high efficiency of the proposed by the authors hybrid models. Table 1 presents the comparative analysis of current hybrid methods and models to form the subsets of informative genes expression profiling for the purpose of both further reconstruction of gene regulatory networks or creation of systems of diseases diagnostics [8]. Analysis of the literature resources and data of Table 1 allows us to conclude that the problem of objective choice of informative genes expression profiling in terms of the resolvability of the investigated diseases does not have a unique solution nowadays. In most cases, high accuracy of classification is achieved by using a small number of the most informative genes. As mentioned above, the initial quantity of active genes in the human genome is approximately 25000. Removal of genes with low expression values can reduce this number to 7–10 thousand. However, as can be seen from Table 1, only in a few cases, the high classifier accuracy is achieved if the selected number of genes exceeds 100 ones. The small number of genes does not allow us to reconstruct a high-quality gene regulatory network, which can allow us to investigate the particularities of the systems molecular elements interaction in its various states to allocate the group of genes that determine the state of patient’s health in the first stage and to determine the character of the impact of changes of these genes expression to other genes during further simulation of the reconstructed network. Significant shortcomings of the methods listed in Table 1 can include also the fact that in most instances the parameters of the appropriate algorithms were determined empirically during the simulation process implementation. In other words, the models are not self-organizing. This fact contributes to a large percentage of subjectivity to the final decision concerning the extraction of the informative genes. The solution to this problem can be achieved by applying an ensemble of appropriate methods with subsequent decision-making based on the analysis of quantitative quality criteria corresponding to this stage. A partial solution to this problem is presented in [14,15]. The authors proposed a hybrid model for the selection of genes expression profiling based on the complex application of quantitative statistical criteria and Shannon information entropy, the SOTA (Self Organizing Tree Algorithm) clustering algorithm and
74
L. Yasinska-Damri et al.
Table 1. Comparative analysis of hybrid models to form subsets of informative genes expression profiling [8] Ref. Filtering method
Grouping
Classifier
Dataset
method
Accuracy, Number %
of genes
[28] Mutual Information Genetic Maximization Algorithm
SVM
Colon [9]
83,41
202
[26] X 2 - test
SVM
Colon [9]
100
8
[19] Laplacian and Fisher score
Genetic Algorithm
Genetic Algorithm
SVM
KNN
NB
[17] Correlation-based Feature Selection
[38] Information Gain
[42] Fisher Criterion
Genetic Algorithm
Genetic Algorithm
Ant Colony Optimization
KNN
GP
SVM KNN NB
[43] Mutual Information Ant Colony Optimization
[20] Fisher Criterion
Bat Algorithm
DLBCL [6]
100
6
SRBCT [25]
100
8
Leukemia1 [22] 100
5
SRBCT [25]
100
18
Leukemia1 [22] 100
15
Prostate [3]
96.3
14
Breast [4]
100
2
DLBCL [6]
100
9
SRBCT [25]
91.6
NA
Leukemia1 [22] 97.2
NA
Prostate [3]
95.6
NA
Breast [4]
95.5
NA
DLBCL [6]
97.9
NA
SRBCT [25]
98.2
NA
Leukemia1 [22] 93.1
NA
Prostate [3]
93.4
NA
Breast [4]
100
NA
DLBCL [6]
95.8
NA
SRBCT [25]
100
29
Prostate [3]
99.2
24
Lung [2]
98.42
195
Colon [9]
85.48
60
Leukemia1 [22] 97.06
3
Lung [2]
100
9
Prostate [3]
100
26
Leukemia1 [22] 95.5
3202
Prostate [3]
14
Leukemia1 [22] 94.30
3
Prostate [3]
15
KNN NB
99.25
Leukemia1 [22] 99.95
4
Prostate [3]
99.4
10
100
NA
Leukemia1 [22] 100
NA
Fuzzy classif. Colon [9]
SVM
98.35
Prostate [3]
90.85
NA
SRBCT [25]
85
6
Prostate [3]
94.1
6
SRBCT [25]
100
6
Prostate [3]
97.1
6
SRBCT [25]
100
6
Prostate [3]
97.1
6 (continued)
Genes Expression Profiling Extraction: A Review
75
Table 1. (continued) Ref. Filtering method
Grouping
Classifier
[13] Independent Artificial Bee NB Component Analysi Colon
[10] Minimum redundancy Maximum Relevance
[35] Probabilistic Random Functio
[24] Correlation-based Feature Selection
Dataset
method
Artificial Bee SVM Colony
Particle KNN Swarm Optimization Particle NB Swarm Optimization
of genes
98.14
16
Leukemia1 [22] 98.68
12
Leukemia2 [12] 97.33
15
Lung [2]
92.45
24
Colon [9]
96.77
15
SRBCT [25]
100
10
Leukemia1 [22] 100
14
Leukemia2 [12] 100
20
Lung [2]
100
8
Lymphoma [7] 100
5
Colon [9]
84.38
60
SRBCT [25]
89.28
100
Colon [9]
Lymphoma [7] 87.71 94.89
4
SRBCT [25]
100
34
Leukemia1 [22] 100
4
Leukemia2 [12] 100
6
Lymphoma [7] 100
24
MLL [5]
100
30
Breast [4]
100
10
91.93
3
Black Hole Algorithm
Bagging Classif. Colon [9]
[27] Fisher-Markov selector
Biogeogr. algorithm
SVM
Grasshopp. Optim. Alg.
NN
[39] Symmetrical Uncertainty
Harmony Search Algorithm
NB
[21] Fast Correlation-Based Filter
Part. Swarm SVM Opt. and GA
[11] Minimum Genetic Bee redundancy Colony and Maximum Relevanc Genetic Algorithm
SVM
50
Colon [9]
[36] Random Forest Ranking
[41] Logarithmic Transformation
Accuracy, Number %
MLL [5]
98.61
5
SRBCT [25]
100
6
Prostate [3]
98.3
12
Lung [2]
98.4
16
Colon [9]
95
NA
Leukemia1 [22] 94
NA
Colon [9]
87.53
9
SRBCT [25]
99.89
37
Leukemia1 [22] 100
26
Leukemia2 [12] 100
24
Lymphoma [7] 100
10
MLL [5]
98.97
10
Colon [9]
96.3
1000
DLBCL [6]
100
3204
Colon [9]
98.38
10
SRBCT [25]
100
6
Leukemia1 [22] 100
4
Leukemia2 [12] 100
8
Lung [2]
100
4
Lymphoma [7] 100
4
76
L. Yasinska-Damri et al.
an ensemble of binary classifiers. The final decision concerning the extraction of genes was made based on the analysis of the results of the fuzzy inference system operation. The optimal parameters of the SOTA clustering algorithm were determined in advance on the basis of the inductive model of objective clustering proposed by the authors in [16]. The structural block diagram of the proposed model is shown in Fig. 1 [15].
Fig. 1. A block-chart of step-by-step procedure of genes expression profiling subset formation
Application of the proposed technique assumed the following: 1. Removal of non-informative genes by statistical criteria and Shannon entropy. It was assumed that if the average of the absolute values of gene expression profile and its variance is less and Shannon’s information entropy is greater than the corresponding boundary values, the gene was removed from the database as uninformative. The boundary values of the corresponding parameters were determined empirically during the simulation process. To do this, the set of studied samples was divided into groups depending on the class to which the appropriate sample belongs (sick or not or health state). At each step of analysis and removal of genes, the criterion of quality of the grouping of the samples was calculated, which took into account both the density of samples allocation in the separated groups and the density of centers of the separated groups distribution. The boundary values corresponded to the extreme of the quality criteria. 2. Stepwise hierarchical clustering of genes expression profiling at hierarchical levels from 1 to N using the SOTA clustering algorithm with a correlation metric for estimating the distance between genes expression profiling. The simulation results showed that at one step of the SOTA algorithm with correlation metric application, the data are divided into two approximately identical subsets. In this instance, the number of clusters varies from 2 to 2N at
Genes Expression Profiling Extraction: A Review
3. 4.
5. 6.
77
the first and Nth hierarchical levels, respectively. The clustering quality criteria were calculated at each hierarchical level during the simulation procedure performing. Selection of the best subsets of genes expression profiling at each hierarchical level, that are matched the extreme of the used clustering quality criteria. Applying an ensemble of classifiers to selected clusters: GLM, SVM, CART and RF. Calculation of quantitative evaluations of the effectiveness of each of the classifiers (Accuracy (AC), F-measure (F), Matthews correlation coefficient (MCC)). Formation of intermediate decisions concerning the results of the classification of the samples at each hierarchical level for each of the applied classification methods. Configuring the fuzzy inference system, choosing the inference algorithm, forming the membership functions of fuzzy sets and the base of rules. Fuzzy classification of objects with the calculation of quantitative quality indexes. Analysis of the obtained results.
As the simulation result, 401 genes, which allow obtaining the maximum accuracy of classification by an ensemble of methods (96.5%) were allocated. According to the authors’ mind, these genes expression profiling can be further used to reconstruct the gene regulatory network in order to study the particularities of genes interactions depending on the patient’s health. However, we would like to note that the significant disadvantage of the proposed model is that the data contain only two states (classes), which limits its application to multi-class data. Moreover, the model needs validation using other types of gene expression data. The first stage of the model also has a percentage of subjectivity, because the division of objects into pre-known clusters does not allow us to objectively assess the result of the classification of objects in real conditions.
3
Conclusions
In this review, we have analyzed the current research in the area of genes expression profiling selection based on the complex use of statistical techniques, cluster analysis and classification methods. The selection of qualitative genes in terms of their ability to recognize the investigated disease is a very important stage since they can allow us to create an effective disease diagnostic system on the one hand and to reconstruct the gene regulatory network to understand the particularities of genes interconnections under various conditions on the other hand. An analysis of the literature resources has shown that the problem of objective choice of informative genes expression profiling in terms of the resolvability of the investigated diseases does not have a unique solution nowadays. In most cases, high accuracy of classification is achieved by using a small number of the most informative genes. The high classifier accuracy is achieved if the selected number of genes exceeds 100 ones. The small number of genes does not allow us to reconstruct a high-quality gene regulatory network, which can allow investigating the particularities of the systems molecular elements interaction in its various states to allocate the group of genes that determine the state of patient’s
78
L. Yasinska-Damri et al.
health in the first stage and to determine the character of the impact of changes of these genes expression to other genes during further simulation of the reconstructed network. Significant shortcomings of the existing methods can include also the fact that in most instances the parameters of the appropriate algorithms were determined empirically during the simulation process implementation. In other words, the models are not self-organizing. This fact contributes to a large percentage of subjectivity to the final decision concerning the extraction of the informative genes. The solution to this problem can be achieved by applying an ensemble of appropriate methods with subsequent decision-making based on the analysis of quantitative quality criteria corresponding to this stage. This is a perspectives of the further authors’ research.
References 1. Arrayexpress – functional genomics data. https://www.ebi.ac.uk/arrayexpress/ 2. Gems: Gene expression model selector. http://www.gems-system.org/ 3. Gene expression correlates of clinical prostate cancer behavior: Cancer cell. www. cell.com/cancer-cell/fulltext/S1535-6108(02)00030--2 4. Microarray datasets http://csse.szu.edu.cn/staff/zhuzx/Datasets.html 5. Mldata: Repository leukemia mll. http://mldata.org/repository/data/viewslug/ leukemiamll/ 6. Uci machine learning repository: Data sets. https://archive.ics.uci.edu/ml/ datasets.html 7. Alizadeh, A., Elsen, M., Davis, R.: Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000). https:// doi.org/10.1038/35000501 8. Almugren, N., Alshamlan, H.: A survey on hybrid feature selection methods in microarray gene expression data for cancer classification. IEEE Access 7, 78533– 78548 (2019). https://doi.org/10.1109/ACCESS.2019.2922987 9. Alon, U., Barka, N., Notterman, D., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. National Acad. Sci. United States Am. 96, 6745–6750 (1999). https://doi.org/10.1073/pnas.96.12.6745 10. Alshamlan, H., Badr, G., Alohali, Y.: Mrmr-abc: A hybrid gene selection algorithm for cancer classification using microarray gene expression profiling. BioMed Research International 2015, art. no. 604910 (2015). https://doi.org/10.1155/2015/ 604910 11. Alshamlan, H., Badr, G., Alohali, Y.: Genetic bee colony (GBC) algorithm: a new gene selection method for microarray cancer classification. Comput. Biol. Chem. 56, 49–60 (2015). https://doi.org/10.1016/j.compbiolchem.2015.03.001 12. Armstrong, S., Staunton, J., Silverman, L.: Mll translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genet. 30(1), 41–47 (2002). https://doi.org/10.1038/ng765 13. Aziz, R., Verma, C., Srivastava, N.: A novel approach for dimension reduction of microarray. Comput. Biol. Chem. 71, 161–169 (2017). https://doi.org/10.1016/j. compbiolchem.2017.10.009
Genes Expression Profiling Extraction: A Review
79
14. Babichev, S., Durnyak, B., Zhydetskyy, V., Pikh, I., Senkivskyy, V.: Application of optics density-based clustering algorithm using inductive methods of complex system analysis. In: IEEE 2019 14th International Scientific and Technical Conference on Computer Sciences and InformationTechnologies, CSIT 2019 - Proceedings pp. 169–172 (2019). https://doi.org/10.1109/STC-CSIT.2019.8929869 ˇ 15. Babichev, S., Skvor, J.: Technique of gene expression profiles extraction based on the complex use of clustering and classification methods. Diagnostics 10(8), art. no. 584 (2020). https://doi.org/10.3390/diagnostics10080584 16. Babichev, S., Gozhyj, A., Kornelyuk, A., Lytvynenko, V.: Objective clustering inductive technology of gene expression profiles based on sota clustering algorithm. Biopolymers Cell 33(5), 379–392 (2017). https://doi.org/10.7124/bc.000961 17. Chuang, L.Y., Yang, C.H., Wu, K.C., Yang, C.H.: A hybrid feature selection method for dna microarray data. Comput. Biol. Med. 41(4), 228–237 (2011). https://doi.org/10.1016/j.compbiomed.2011.02.004 18. Chyrun, L., Kravets, P., Garasym, O., Gozhyj, A., Kalinina, I.: Cryptographic information protection algorithm selection optimization for electronic governance it project management by the analytic hierarchy process based on nonlinear conclusion criteria. In: CEUR Workshop Proceedings, vol. 2565, pp. 205–220 (2020) 19. Dashtban, M., Balafar, M.: Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts. Genomics 109(2), 91–107 (2017). https://doi.org/10.1016/j.ygeno.2017.01.004 20. Dashtban, M., Balafar, M., Suravajhala, P.: Gene selection for tumor classification using a novel bio-inspired multi-objective approach. Genomics 110(1), 10–17 (2018). https://doi.org/10.1016/j.ygeno.2017.07.010 21. Djellali, H., Guessoum, S., Ghoualmi-Zine, N., Layachi, S.: Fast correlation based filter combined with genetic algorithm and particle swarm on feature selection. In: 2017 5th International Conference on Electrical Engineering - Boumerdes, ICEE-B 2017, pp. 1–6 (2017). https://doi.org/10.1109/ICEE-B.2017.8192090 22. Golub, T., Slonim, D., Tamayo, P., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 527–531 (1999). https://doi.org/10.1126/science.286.5439.531 23. Izonin, I., Tkachenko, R., Verhun, V., et al.: An approach towards missing data management using improved grnn-sgtm ensemble method. Int. J. Eng. Sci. Technol. in press (2020). https://doi.org/10.1016/j.jestch.2020.10.005 24. Jain, I., Jain, V., Jain, R.: Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification. Appl. Soft Comput. 62, 203–215 (2018). https://doi.org/0.1016/j.asoc.2017.09.038 25. Khan, J., Wei, J., Ringn´er, M., et al.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Med. 7(6), 673–679 (2001). https://doi.org/10.1038/89044 26. Lee, C.P., Leu, Y.: A novel hybrid feature selection method for microarray data analysis. Appl. Soft Comput. J. 11(1), 208–213 (2011). https://doi.org/10.1016/j. asoc.2009.11.010 27. Li, X., Yin, M.: Multiobjective binary biogeography based optimization for feature selection using gene expression data (2013). https://doi.org/10.1109/TNB.2013. 2294716 28. Lu, H., Chen, J., Yan, K., et al.: A hybrid feature selection algorithm for gene expression data classification. Neurocomputing 256, 56–62 (2017). https://doi. org/10.1016/j.neucom.2016.07.080
80
L. Yasinska-Damri et al.
29. Lytvyn, V., Gozhyj, A., Kalinina, I., et al.: An intelligent system of the content relevance at the example of films according to user needs. In: CEUR Workshop Proceedings, vol. 2516, pp. 1–23 (2019) 30. Lytvyn, V., Salo, T., Vysotska, V., et al.: Identifying textual content based on thematic analysis of similar texts in big data. In: IEEE 2019 14th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2019 - Proceedings, vol. 2, pp. 84–91 (2019). https://doi.org/10. 1109/STC-CSIT.2019.8929808 31. Marasanov, V., Sharko, A., Sharko, A., Stepanchikov, D.: Modeling of energy spectrum of acoustic-emission signals in dynamic deformation processes of medium with microstructure. In: 2019 IEEE 39th International Conference on Electronics and Nanotechnology, ELNANO 2019 - Proceedings, pp. 718–723 (2019). https://doi. org/10.1109/ELNANO.2019.8783809 32. Marasanov, V., Stepanchikov, D., Sharko, A., Sharko, A.: Technique of system operator determination based on acoustic emission method. Adv. Intell. Syst. Comput. 1246, 3–22 (2021). https://doi.org/10.1007/978-3-030-54215-3 1 33. Marasanov, V., Sharko, A., Sharko, A.: Energy spectrum of acoustic emission signals in coupled continuous media. J. Nano- Electron. Phys. 11(3), art. no. 03027 (2019). https://doi.org/10.21272/jnep.11(3).03028 34. Marchetti, M., Coit, D., Dusza, S., et al.: Performance of gene expression profile tests for prognosis in patients with localized cutaneous melanoma: A systematic review and meta-analysis. JAMA Dermatology 156(9), 953–962 (2020). https:// doi.org/10.1001/jamadermatol.2020.1731 35. Moradi, P., Gholampour, M.: A hybrid particle swarm optimization for feature subset selection by integrating a novel local search strategy. Appl. Soft Comput. J. 43, 117–130 (2016). https://doi.org/10.1016/j.asoc.2016.01.044 36. Pashaei, E., Ozen, M., Aydin, N.: Gene selection and classification approach for microarray data based on random forest ranking and bbha. In: 3rd IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2016. p. art. no. 7455896 (2016). https://doi.org/10.1109/BHI.2016.7455896 37. Rzheuskyi, A., Kutyuk, O., Vysotska, V., et al.: The architecture of distant competencies analyzing system for it recruitment. In: IEEE 2019 14th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2019 - Proceedings, vol. 3, pp. 254–261 (2019). https://doi.org/10. 1109/STC-CSIT.2019.8929762 38. Salem, H., Attiya, G., El-Fishawy, N.: Classification of human cancer diseases by gene expression profiles. Appl. Soft Comput. J. 50, 124–134 (2017). https://doi. org/10.1016/j.asoc.2016.11.026 39. Shreem, S., Abdullah, S., Nazri, M.: Hybrid feature selection algorithm using symmetrical uncertainty and a harmony search algorithm (2016). https://doi.org/10. 1080/00207721.2014.924600 40. Tkachenko, R., Izonin, I., Kryvinska, N., et. al.: An approach towards increasing prediction accuracy for the recovery of missing iot data based on the grnn-sgtm ensemble. Sensors (Switzerland) 20(9), art. no. 2625 (2020). https://doi.org/10. 3390/s20092625 41. Tumuluru, P., Ravi, B.: Goa-based dbn: Grasshopper optimization algorithm-based deep belief neural networks for cancer classification (2017)
Genes Expression Profiling Extraction: A Review
81
42. Vafaee Sharbaf, F., Mosafer, S., Moattar, M.: A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. Genomics 107(6), 231–238 (2016). https://doi.org/10.1016/j.ygeno. 2016.05.001 43. Vijay, S.A.A., GaneshKumar, P.: Fuzzy Expert System based on a Novel Hybrid Stem Cell (HSC) algorithm for classification of micro array data. J. Med. Syst. 42(4), 1–12 (2018). https://doi.org/10.1007/s10916-018-0910-0 44. Wang, L., Song, F., Yin, H., et al.: Comparative micrornas expression profiles analysis during embryonic development of common carp, cyprinus carpio. Comparative Biochemistry and Physiology - Part D: Genomics and Proteomics 37, art. no. 100754 (2021). https://doi.org/10.1016/j.cbd.2020.100754
A Method of Analytical Calculation of Dynamic Characteristics of Digital Adaptive Filters with Parallel-Sequential Weight Summation Kostiantyn Semibalamut1 , Volodymyr Moldovan1 , Svitlana Lysenko1 , Maksym Topolnytskyi2(B) , and Sergiy Zhuk3 1
Eugene Bereznyak Military-Diplomatic Academy, Kyiv, Ukraine [email protected], [email protected] 2 Research Institute of the Ministry of Defense of Ukraine, Kyiv, Ukraine 3 National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine [email protected]
Abstract. In this article, we present the result of the research concerning the development of a method for analytical calculation of the dynamic characteristics of digital adaptive filters with parallel-serial weight summation of signals for one of the variants of a multi-stage adaptive interference compensator with block orthogonalization of compensation channel signals. The article also presents the results of a research of the degree of coincidence of the results of theoretical calculation and statistical modeling of the adaptive compensator (filter) for interference suppression. Each stage of the adaptive filter with parallel-sequential weight summation consists of modules, which are a scheme of a single-channel adaptive compensator. The complex envelope of the signal from the output of the digital antenna array (DAA) is goes to the main channel of the adaptive filter. Compensation channels are formed on the basis of the antenna elements of the main channel. Equidistant antenna array with a distance between the array elements equal to 0.5 wavelength is selected as the DAA. The method consists in obtaining recurrent relations, according to which the transmission coefficients of the modules placed in the first stage (row) of the structural scheme of the adaptive filter with parallelsequential weight summation of signals are first calculated. Then, the transmission coefficients of the modules placed in the second stage (row) of the structural scheme are calculated, etc. The calculated values of the transmission coefficients of all modules of the structural scheme of the adaptive filter with parallel-sequential weight summation are used to calculate the interference power at the output of the filter, depending on the parameters of the interference correlation matrix. Also, statistical simulation modeling and theoretical calculation of dynamic characteristics of an adaptive filter with parallel-sequential weight summation of signals was performed. The results of modeling and analytical calculation
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 82–93, 2022. https://doi.org/10.1007/978-3-030-82014-5_6
A Method of Analytical Calculation of DC of DAF with PSWS
83
of interference power at the output of the adaptive filter with parallelsequential weight summation show a satisfactory coincidence of dynamic characteristics under the action of two, three and four sources of interference on a 5-channel adaptive filter. The modeling can be reproduced many times for different interference situations. The drawback of this method is that it is developed for a single variant of the known scheme of a multistage adaptive noise compensator with block orthogonalization of signals. The use of the method will allow at the design stage of the structure and characteristics of adaptive filters with parallel-sequential weight summation to choose its parameters without the use of statistical modeling. Keywords: Radio-electronic means · Digital antenna array · Spatial signal compensation · Digital adaptive filter with parallel-sequential weight summation
1
Introduction
The saturation of the air with radio radiation from a large number of radioelectronic means of various purposes (for example, radiolocation, radio communication etc.), which not interconnected, but operating in the same radio frequency band, leads to mutual radio interference. This leads to the need to search for ways to increase the effectiveness of radio-electronic protection means and their constant development. For spatial compensation of active noise interference in radio-electronic means of various purposes, automatic compensators based on digital adaptive antenna arrays are widely used [9,14,22]. For many radio-electronic means, the use of adaptive antenna arrays is one of the ways to suppress interference as a result of the formation of zeros in the radiation pattern to the sources of interference [8,10,18,21]. Continuous improvement of signal processing technologies, research and search for the most effective algorithms for digital spatial signals filtering is an important area in the development and modernization of radio-electronic means for various purposes [4,7,10,11,15,20]. The raised question is connected with the decision of actual scientific and practical problems concerning constant improvement and increase of efficiency of noise protection of radio-electronic means, research of signals digital filtering algorithms [1–3,8,16,19].
2
Literature Review
Adaptation to the location environment has been carried out for more than half a century. But among the trends in the development of modern radio-electronic systems, one of the most difficult is the task of increasing noise protection. It can be solved using modern digital adaptation methods. Traditionally, spatial
84
K. Semibalamut et al.
filtering algorithms are characterized by high computational complexity, which complicates their implementation in real time [4,8,12,17,23,24]. Thus, multidimensional notch filters based on “Levinson block factorization” are considered in [4,10,17], but the similarity to parallel-serial signal processing does not allow to obtain a theoretical calculation of the performance of digital adaptive filters with parallel-sequential weight summation (PSWS). The general elements of different systems of adaptive space-time signal processing and the place of adaptive lattice filters in them are considered [7]. The speed of correlation interferenceauto-compensators, quasi-Newtonian algorithms based on the maximum likelihood of the noise correlation matrix (CM) and their diagonally regularized variety, as well as lattice filters are compared [4,9,11,16]. In [20] only some issues of stabilization of dynamic parameters of space-time adaptive filters on the interference suppression index are revealed. At the same time, the main attention is paid to the method of dynamic multipliers in non-deterministic interference situations. One of the main directions of increasing the speed of computing means in multifunctional radio-electronic means is the parallelization of algorithms with subsequent simultaneous processing on all parallel channels [8,13,23]. A promising approach is based on orthogonal transformations of signals [7,12,17]. A further development of such signal processing is multi-stage adaptive compensation of active noise interference with block orthogonalization of signals of compensation channels [24]. The final parallel-sequential orthogonalization can be realized by parallel-sequential multistage orthogonalization of interference of groups of additional channels by parallel-sequential integration in the general case of multi-input weight adders. The similar processing structure in case of using two-input weight adders is one of the options for parallel-sequential weight summation of signals [4,9,14,24]. Despite the existence of works in which the characteristics of gradient adaptive filters with PSWS have been studied, so far no theoretical relations have been obtained on the basis of which the performance of such filters in different interference situations can be estimated without using statistical modeling methods [6,12,17,18,21,22,24]. That is, the issue of analytical calculation of the performance of digital adaptive filters with PSWS signals is not resolved.
3
Problem Statement
It can be assert that digital adaptive signal processing algorithms are relevant for today. Thus, according to the analysis of research and publications [7,10– 12,18,20,21,24], the calculation of the performance of digital adaptive filters with PSWS is carried out by the method of statistical modeling, because methods of analytical calculation have not been developed. The method proposed below makes it possible to eliminate this drawback in the study of digital adaptive filters with PSWS of signals. Analysis of the speed and efficiency of noise suppression by digital adaptive filters with PSWS is associated with high computational costs [4,6,18,21,22, 24] and the need for statistical modeling, so solving the problem in general for
A Method of Analytical Calculation of DC of DAF with PSWS
85
multi-channel filter structures will eliminate the main shortcomings of statistical modeling. The aim of this article is to improve the designing of digital adaptive filters with PSWS, which will allow solving the problem of increasing their efficiency based on the developed method for analytical calculation of filter performances. Note that, the article does not provide for to receive quantitative indicators of reducing the computational costs of theoretical calculations in comparison with statistical modeling. Let be that the inputs of the N -channel (by the number of additional receive channels) digital adaptive filter (DAF) with PSWS, as shown on Fig. 1, receive signals from M sources of active noise and useful signal. At that M < N . For simplicity, we assume that the useful signal is only in the (N + 1) receive channel.
Fig. 1. Structural scheme of DAF with PSWS [6, 18, 24]
It is necessary to establish an analytical relationship between the rate of change of interference power at the DAF with PSWS output with the parameters that characterize the interference situation, and the values of the step factor (μp ) in the filter modules and the number of iterations (n).
86
4
K. Semibalamut et al.
Development of the Method of Analytical Calculation of Dynamic Characteristics of Digital Adaptive Filters with Parallel-Sequential Weight Summation
It can be shown that at the n-th iteration, the power of the interference at the output of the DAF with PSWS will be [6,7,12,14,17,21,24]: T T T Pint.out (n) = αN +1 · L(n − 1) · U (n) · U (n) · L (n − 1) · αN +1 ,
(1)
where αN +1 – (N + 1)-dimensional vector-column, in which N + 1 element is equal to 1, and the other elements are equal to 0; U (n) – (N + 1)-dimensional column vector in the general case of complex signals at the inputs of the DAF, according to the n-th sample; LT (n − 1) =
N
LTp (n − 1),
p=1
LTp (n−1) – matrix (N +1)×(N +1)-dimensional, which differs from the identity matrix only by the p-th row, whose elements are:
lp,h
lp,h = 0, if p > h; lp,h = 1, if p = h; = kp,h−1 (n − 1), if p < h,
kp,h−1 (n − 1) – transmission coefficient (weight coefficient) in the module, switched on at the intersection of the p-th row and the m-th (m = h − 1) column of modules of the DAF with PSWS as shown on Fig. 1; the line from above in relation (1) means averaging in the set (ensemble) of realizations; “T ” – sign of hermitian transposition. Using the results obtained in [5], and assuming, that the values of the step multiplier μp in the filter modules are chosen so that the relative noise level of the gradient does not exceed values from 1 to 1.5 dB, with acceptable accuracy of the ratio (1) can be represented as: T Pint.out (n) ≈ αN +1
N p=1
LN +1−p (n − 1) · R ·
N
LTp (n − 1) · αN +1 ,
(2)
p=1
where R = U (n) · U T (n) – CM of signals at DAF inputs. In the case of the quasi-gradient algorithm for calculating weight coefficients [6,7,14], the transmission coefficient of the pm-th module of DAF with PSWS (the number of the module coincides with the index at k) on the n-th step of adaptation, taking into account the designations in Fig. 1 is: kpm (n) = kpm (n − 1) + μp · upp (n) · u∗p+1,m+1 (n)
(3)
where μp – step multiplier in modules, which located in p-th line on structural scheme of the DAF with PSWS; upp (n) – n-th signal sampling at the output
A Method of Analytical Calculation of DC of DAF with PSWS
87
of the pp-th module (diagonal modules of DAF with PSWS); m = p, N ; ∗ – complex conjugation sign. Taking into account that when p > 1 and assuming that kij (n) · kpm (n) ≈ ≈ kij (n) · kpm (n), the signals on the inputs of the respective modules can be calculated by the following ratios: upp (n) = αpT ·
p−1
Lp−i (n − 1) · U (n);
(4)
i=1 T upm (n) = αm ·
p−1
Lp−i (n − 1) · U (n),
(5)
i=1
where αm – (N + 1)-dimensional vector-column, where m-th element equal 1, and others – equal 0. After a series of transformations of the ratio (3), and taking into account (4) and (5), the average values of the weight coefficients of the corresponding filter with the PSWS modules will be calculated by the following expression if p = 1, m = 1, N : −1 k1m (n) = [(1 − μ1 · r11,11 )n − 1] · r11,11 · r11,1(m+1) ,
(6)
where r11,1m = u11 (n) · u∗1m (n) – the average value of the correlation coefficient of the signals of the main and compensation channels of the corresponding filter module with PSWS. For the following stages of the filter with PSWS (if p > 2, m = p, N ) the average values of the weight coefficient of the respective filter modules will be calculated by the following expressions: n−1 n−v−1
kpm (n) = −μp · {
(
v=1
Fp (n − i)) · rpp,p(m+1) (v) + rpp,p(m+1) (n)},
(7)
i=0
where Fp (n) = 1 − μp · rpp,pp (n); rpp,pm (n) = β2T · Qp−1,p−1 (n − 1) · Mp−1,m · QTp−1,m−1 (n − 1) · β2 ; 1 0 (8) Qp−1,m−1 (n − 1) = T ; k(p−1)(m−1) (n − 1) 1 r (n) r(p−1)(p−1),(p−1)m (n) T ; β = 0 . . . 1 ; Mp−1,m = (p−1)(p−1),(p−1)(p−1) ∗ (n) r(p−1)p,(p−1)m (n) 2 r(p−1)(p−1),(p−1)p rpp,pm = upp (n) · u∗pm (n). Relationships (6) and (7) are recurrent. At first, in accordance with (6) the transmission coefficients of the modules, located in the first stage (row) of the structural scheme of DAF with PSWS, presented on Fig. 1, are calculated. Then, in accordance with (7) and taking into account ratio (8), the transmission coefficients of the modules of the second stage (row) are calculated, etc.
88
K. Semibalamut et al.
The calculated values of the transmission coefficients of all modules, included in the DAF with PSWS, are used to calculate the interference power at the filter output (ratio 2), depending on the parameters of the CM R (interference situation variant) and step factors μp .
5
Results and Discussion
The results of the analytical calculation for 5-channel DAF with PSWS during the action of two, three and four sources of active noise interferences with a relative total power in each receiving channel P int. , which is 600, are shown in the form of graphs on Fig. 2 and Fig. 3 (the corresponding dependencies are denoted by the numbers 1 and 2).
Fig. 2. Dependence of interference power at the output of DAF with PSWS on the number of iterations (good conditionality of CM)
Figure 2 shows the dependence of the interference power at the output of the DAF with PSWS in the case of good conditionality (η) of the CM at the inputs of additional reception channels (the ratio of maximum and minimum signal eigenvalues of this matrix is approximately 5 dB). And Fig. 3 shows the dependence of the interference power at the output of the DAF with PSWS in case of bad conditionality (this ratio is almost 25 dB). To illustrate the coincidence of theoretical dependences with actual ones, the results of statistical modeling are shown in the same figures (marked 1a and 2a). Comparing the dependencies, we can conclude that the results of analytical calculation and modeling are in
A Method of Analytical Calculation of DC of DAF with PSWS
89
Fig. 3. Dependence of interference power at the output of DAF with PSWS on the number of iterations (bad conditionality of CM)
good coincidence. For the considered interference situation, the calculation error is due to the assumptions made in deriving relations (2) and (7), if n > 10 does not exceed 2...3 dB. This error is due to the presence of gradient noise, which was not taken into account when using (2) to calculate the power of interference at the DAF output [5]. The corresponding calculations show that the relative level of gradient noise in the DAF with PSWS, in contrast to the DAF with parallel weight summation [6,12], increases with increasing number of interference sources and is approximately 0.5 × (M − 1) dB for the values μp , conditioned in the relation (2). Figure 2 shows that, the increase in the adaptation speed of DAF with PSWS (albeit insignificant) in case increasing number of interference sources explained by the fact that during the tuning there is a “over-regulation” of the weighting coefficients of the modules kpm , if p > m, which then leads to a slowdown of the transient processes in the DAF. At that “over-regulation” is more, the greater the value of the difference (N − M ). It is clearly illustrated on Fig. 4 by the dependences of the values of the normalized modules of the weight coefficients calculated by ratio (7). Normalization is performed to the levels of steady-state values of the modules of the weight coefficients. The number of sources of active interference (M ) is shown in brackets next to the module number. The deterioration of the dynamic characteristics of the DAF with PSWS with good and bad (for M > 2) conditionality of the CM is caused by the aforementioned effect of “over-regulation” of the weight coefficient of the N N module as shown on Fig. 5. The degree of “over-regulation” increases with increasing values of the step factor (μN ).
90
K. Semibalamut et al.
Fig. 4. Dependences of the normalized values of the modules of the weight coefficients of DAF with PSWS on the number of iterations (good conditionality of CM)
Fig. 5. Dependences of the normalized values of the module of the weight coefficients of DAF with PSWS on the number of iterations (good conditionality of CM)
A Method of Analytical Calculation of DC of DAF with PSWS
6
91
Conclusions
Thus, based on the analysis of the results of statistical modeling and analytical calculations, the following conclusions can be made: 1. The aim of the article is achieved. The proposed method of analytical calculation of the performance characteristics of DAF with PSWS of signals is workable, and the assumptions for deriving the ratios are correct. 2. The obtained accuracy of coincidence of theoretical calculations by the developed method and statistical modeling does not exceed the noise level of the gradient 2. . . 3 dB, which were not taken into account to simplify the mathematical transformations. 3. Using of the developed method will provide an opportunity at the design stage of the structure and characteristics of the DAF with PSWS to choose the parameters of the filter without the use of statistical modeling. 4. Also, considering the developed method, it is possible to analyze the causes for the decrease in adaptation speed in case of bad conditionality of CM of interference signals and give recommendations for parameters of DAF with PSWS for preserve their characteristics, which in the future will contribute to improve the efficiency of interference protection systems based on DAF with PSWS. A promising direction of research is the development of a method for analytical calculation of the dynamic characteristics of multistage digital adaptive interference compensators with block orthogonalization of compensation channels signals.
References ˇ 1. Babichev, S., Skvor, J., Fiˇser, J., Lytvynenko, V.: Technology of gene expression profiles filtering based on wavelet analysis. Int. J. Intell. Syst. Appl. 16(4), 1–7 (2018). https://doi.org/10.5815/ijisa.2018.04.01 2. Babichev, S., Lytvynenko, V., Osypenko, V.: Implementation of the objective clustering inductive technology based on dbscan clustering algorithm. In: Proceedings of the 12th International Scientific and Technical Conference on Computer Sciences and Info rmation Technologies, CSIT 2017 pp. 479–484 (2017). https://doi. org/10.1109/STC-CSIT.2017.8098832 3. Babichev, S., Taif, M., Lytvynenko, V.: Inductive model of data clustering based on the agglomerative hierarchical algorithm. In: Proceedings of the 2016 IEEE 1st International Conference on Data Stream Mining and Processing, DSMP 2016, pp. 19–22 (2016). https://doi.org/10.1109/DSMP.2016.7583499 4. Barlow, J.L.: Block modified gram-schmidt algorithms and their analysis. SIAM J. Matrix Anal. Appl. 40, 1257–1290 (2019). https://doi.org/10.1137/18M1197400 5. Bondarenko, B., Semibalamut, K.: Mochnost shumov gradient v tsifrovykh kompensatorakh pomekh. Radioelectronics Commun. Syst. 41(11), 62–66 (1998) 6. Bondarenko, B., Semibalamut, K.: Organizatsiia obchyslennia u tsifrovomu adaptyvnomu filtri. Zbirnyk naukovykh prats KVIUZ 2, 3–13 (1998)
92
K. Semibalamut et al.
7. Haykin, S.: Adaptive Filter Theory, 5th edn. Pearson, Boston (2014) 8. Kuz’min, S.: Tsifrovaya radiolokatsiya. Digital radar, KViTS (2000) 9. Lekhovytskiy, D.I.: Adaptive lattice filters for systems of space-time processing of non-stationary gaussian processes. Radioelectron. Commun. Syst. 61(11), 477–514 (2018). https://doi.org/10.3103/S0735272718110018 10. Lekhovytskiy, D., Rachkov, D., Semeniaka, A., Ryabukha, V., Atamanskiy, D.: Adaptive lattice filters. part i. Theory of lattice structures. Appl. Radio Electron. 10(4), 380–404 (2011) 11. Lekhovytskiy, D., et al.: Bystrodeystvie algoritmov adaptivnoy prostranstvennovremennoy obrabotki signalov na fone pomekh. In: Materiali mezhdunarodnoi nauchno-prakticheskoi konferencii “Sovremennye informatsionnye i elektronnye tekhnjlogii”, pp. 222–225 (2013). http://tkea.com.ua/siet/archive/2013-t1/222. pdf 12. Monzingo, R.A., Haupt, R.L., Miller, T.W.: Introduction to Adaptive Arrays, 2nd edn. Scitech Pub. Inc. (2011) 13. Piza, D., Romanenko, S., Moroz, G., Semenov, D.: Estimation of losses in jammers compensation at the training sample formation by the frequency method. Inf. Telecommun. Sci. 9(2), 5–9 (2018). https://doi.org/10.20535/2411-2976.22018.5-9 14. Plyushch, O., Rybydailo, A.: Adaptive antenna array adaptation algorithm that does not require reference signal presence. Collection of the scientific papers of the Centre for Military and Strategic Studies of the National Defence University 2(66), 108–114 (2019). https://doi.org/10.33099/2304-2745/2019-2-66/108-114 15. Riabukha, V., Semeniaka, A., Katiushyn, Y.: Selection of the number of compensation channels and location of receivers with non-identical amplitude-frequency and phase-frequency responses in adaptive antenna array under noise interference. In: Proceedings of the IEEE Ukrainian Microwave Week (UkrMW). pp. 7–10 (2020). https://doi.org/10.1109/UkrMW49653.2020.9252625 16. Riabukha, V., Semeniaka, A., Katiushyn, Y., Zarytskyi, V., Holovin, O.: Digital adaptive system of radar protection against masking clutter on the basis of adaptive lattice filter. Weapons Mil. Equipment 4(24), 32–40 (2019) 17. Safarian, C., Ogunfunmi, T., Kozacky, W.J., Mohanty, B.: Fpga implementation of lms-based fir adaptive filter for real time digital signal processing applications. In: Proceedings of the IEEE International Conference on Digital Signal Processing (DSP), pp. 1251–1255 (2015). https://doi.org/10.1109/ICDSP.2015.7252081 18. Semibalamut, K., Khamula, S., Zhuk, S., Litvintsev, S.: Determination of radiation pattern for linear antenna array with multistage adaptive compensator of interferences. Bulletin of NTUU KPI, pp. 17–24 (2018). https://doi.org/10.20535/ RADAP.2018.74.17-24 19. Shengheng, L., Yahui, M., Yongming, H.: Sea clutter cancellation for passive radar sensor exploiting multi-channel adaptive filters. IEEE Sens. J. 19(3), 982–995 (2019). https://doi.org/10.1109/JSEN.2018.2879879 20. Skachkov, V., Efimchikov, A., Pavlovich, V., Kovalichin, S.: Otsenka vliyaniya dinamicheskikh parametrov gradientnykh algoritmov adaptatsii na kachestvo podavleniya shumovikh izlucheniy. Zbirnyk naukovykh prats Odesskoi derzhavnoi akademii tekhnichnogo reguliuvannia ta yakosti 1(2), 81–87 (2013) http://nbuv. gov.ua/j-pdf/zbodatr 2013 1 15.pdf 21. Slyusar, V.: Origins of the digital antenna array theory. In: Proceedings of the XI International Conference on Antenna Theory and Techniques, ICATT 2017. pp. 199–201. Igor Sikorsky Kyiv Polytechnic Institute (2017). https://doi.org/10.1109/ ICATT.2017.7972621
A Method of Analytical Calculation of DC of DAF with PSWS
93
22. Slyusar, V.: Kluchevye napravleniya razvitiya radiolokatsionnoy tekhniki. In: Zbirnyk materialiv IX Vseukrainskoi naukovo-praktychnoi konferentsii “‘Aktualni pytannia zabezpechennia sluzhbovoboivoi diialnosti viiskovykh formuvan ta pravookhoronnykh organiv”, pp. 291–294 (2020). https://doi.org/10.13140/RG.2. 2.21329.97120 23. Zhua, Z., Gao, X., Cao, L., Pan, D., Cai, Y., Zhu, Y.: Analysis on the adaptive filter based on LMS algorithm. Optik 127, 4698–4704 (2016). https://doi.org/10. 1016/j.ijleo.2016.02.005 24. Zhuk, S.Y., Semibalamut, K.M., Litvintsev, S.N.: Multistage adaptive compensation of active noise interferences using block orthogonalization of signals of compensation channels. Radioelectron. Commun. Syst. 60(6), 243–257 (2017). https:// doi.org/10.3103/S0735272717060012
Simulation Modeling as a Means of Solving Professionally-Oriented Problems in Maritime Industry Tatyana Zaytseva(B) , Lyudmyla Kravtsova , Oksana Tereshchenkova , and Alona Yurzhenko Kherson State Maritime Academy, Kherson, Ukraine [email protected], {limonova,tereshoks17}@ukr.net, [email protected]
Abstract. Simulation modeling is one of the most commonly used methods of studying the functioning of complex systems to assess possible risks. The use of simulation methods allows us not only to conduct a detailed analysis of the situation, but also to assess the probability of avoiding or appearance of a threat. The ship’s crew must be staffed by qualified seafarers who have received a favorable decision from the management of crewing companies. Such commissions assess the applicant’s competence at the international level, as he must work on ships of the world fleet. Therefore, among the tasks facing higher maritime education, one of the main is the formation of key competencies, the degree of formation of which is assessed at the international level. Over the last 5–6 years, Kherson State Maritime Academy has radically changed the training programs for maritime professionals, implementing a competencybased approach to training. The program of “Information Technologies” course, which is studied by 1st-year cadets, contains “Information Technologies in Navigation” module, which examines the algorithms and methods of solving professionally-oriented problems. The specificity of such tasks involves the use of simulation methods, algorithmization, the ability to use modern computer programs. This content of the course contributes to improving the quality of training of future ship navigators, and implementing the focus of this course on their professional activities. In other words, we begin to form both subject and universal professionally-oriented competencies. This article proposes methodological techniques for the formation of subject professionally oriented competencies of future ship navigators. These techniques are shown on the example of solving an applied problem using an information-modeling approach to the analysis and risk assessment of a ship operating in conditions of changing or unpredictable external influences. These influences are one of the factors that can lead to emergencies. Keywords: Information technologies · Subject competencies Mathematical (simulation) modelling · True wind
·
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 94–106, 2022. https://doi.org/10.1007/978-3-030-82014-5_7
Solving Professionally-Oriented Problems in Maritime Industry
1
95
Introduction
The main thing in the activity of any higher educational institution is the training of a specialist competent in his chosen profession. Only the “end user” the employer who hires the graduate - can assess the level of training. Further career of the graduate and prestige of educational institution depend on the following: how yesterday’s students will behave, what knowledge, skills and abilities they will demonstrate in the process of performing professional duties; how they are able to master and perceive everything new needed in work, to learn and self-educate. The last one is an important factor in the global development of the university, its material and technical equipment, the use of modern teaching methods, which, in turn, affects the ability to train in-demand professionals. Among higher education institutions, a special place is occupied by profession-oriented higher educational institutions that train specialists for a specific field. Such universities, of course, include the Kherson State Maritime Academy (KSM A). Its graduates work as ship navigators, engineers, electrical engineers on ships of maritime companies all over the world. At the same time, they start their professional activity as cadets, because the curriculum includes shipboard practice from the third year. This means that the authorities of the academy and its teachers, who directly train cadets, are faced with the task of training not just seafarers, but specialists who are competitive at the global level. It should be noted that crewing companies are very careful in the selection of crews, because the coherence of their work, the professionalism of each member of the team depends on the end result - the success of the voyage, preservation of cargo, life and health of seafarers. From the official website of the academy you can find out that “Navigation Faculty of KSMA conducts training of officers of sea and river vessels” and “successful implementation of the educational program allows to obtain all the necessary competencies of ship navigators, taking into account the current situation in the field of sea and river transport” [1]. In other words, a high level of training of maritime specialists is set. This task can be realized only by joint efforts of all departments, developing programs of courses taking into account competency-based approach to the training of cadets. Competency-based approach to learning means, first of all, that no course of the program is taught in isolation from professionally conditioned courses. For example, future ship navigator of Maritime Academy has following principal subjects: Navigation and Sailing Direction, Ship Handling, Basic Ship Theory etc. However, the navigator’s training program also includes Physics, Higher Mathematics, and Information Technologies. The content of these and other mandatory (regulatory) courses is aimed at comprehensive training of future seafarers.
2
Problem Statement
One of the sections of the course “Information Technologies”, which is taught in accordance with the curriculum for first-year cadets, is the section “The use
96
T. Zaytseva et al.
of Excel spreadsheets in the work of ship navigator”. It is obvious that all ship computers have special programs for ship management, as well as standard office programs for documentation, current calculations, error checking of ship equipment and other application packages. The ability to use office software is an integral part of the training of any specialist. However, for a navigation officer, computer knowledge at the user level is not enough. He must understand the whole process of modeling the real situation, calculate the risks of constant and random factors on the trajectory of the vessel, be able to apply theoretical knowledge to perform both simple and complex calculations in practice. The development of the “Information Technologies” subject program for the training of the navigators is based on this approach. Let’s have a look in more detail at some specific parameters of maritime industry. One of the most important factors influencing the characteristics of the vessel’s movement is the wind. Changing the speed and direction of the wind during the day in the area of sailing or work, respectively, leads to the adjustment of process parameters. One of the most difficult tasks is to determine the felt wind and the true wind on a moving vessel. The shipowner’s responsibilities include assessing the following: direction of the wind in the rhumbs; the strength of the wind in numbers according to Beaufort scale; the impact of this wind on the ship and its movement. And a navigator provides recommendations based on the analysis to reduce this impact. According to research [4], the wind affects the drift of the vessel, leads to a loss of speed, creates rolling. Wave components depend on force, directions and duration of wind’s action. The wind creates various sea currents, in coastal and shallow areas causes significant positive and negative surges, as a result of which shelves may appear and navigation of vessels will be difficult. Or, conversely, there may be the possibility of free passage through overfalls. As the wind increases, the stability and controllability of the vessel, crew environment and the operation of the vessel’s equipment deteriorate. Strong stormy winds can make it impossible for some ships to continue sailing, and then ships are forced to seek where to hide in ports, bays and harbors. In addition, the wind is a variable element in both speed and direction, so the averaging of measured values of the parameters is used. It should be noted that one of the profile departments of academy, according to the curriculum, following courses are studied: “Hydrological and Meteorological Instruments and Appliances” and “Meteorology and Oceanography”. Cadets do laboratory-based work, the purpose of which is to acquire practical skills in determining the wind felt by the signs and the true wind on a moving ship. During the work cadets define a wind direction, measure by means of a hand anemometer MS-13 or M-61 its speed. They determine the true wind graphically and using the so-called wind speed indicator. Then the direction of wind is determinated in rhumbs, and the strength of the wind is put according to the Beaufort scale. The last stage of the work is to assess the impact of wind on the ship and formulate recommendations to reduce its impact on the ship [8].
Solving Professionally-Oriented Problems in Maritime Industry
97
To perform this work, the cadets used to use, in addition to special equipment, ordinary school tools - ruler, drawing compass, notebook and pencil. Therefore, the final conclusions could have human errors, which reduced the probability of making the right decision in this situation. But the availability of modern digital technologies fundamentally changes the approach to solving any technical or technological problems. In addition, knowledge of the simulation-model method and its use in solving problems similar to those considered in this paper corresponds to a scientifically-based approach to solving problem situations. It also provides a basis for knowledge system necessary for future professional activity. Model-based method can be used on ships to pre-assess the safe maneuvering of the ship in difficult weather conditions, where you can clearly assess the potential threats [11]. Modeling of wind influence at changing initial data (wind speed, its direction, distance from coast, etc.) with the subsequent implementation of calculations of management decision parameters in Excel spreadsheets is one of the tasks solved within the course “Information technologies” for ship navigators. It should be noted that the impact of wind on handling of the ship have been, is and will continue to be important. The experience gained over decades allows to develop the strategy of the actions directed on acceptance of the optimum administrative decision in everyday and emergency situations on the ship. Improvements in computer technology lead this process to a new level, and the main task of the maritime education institution is to teach the future seafarer to make the most of the opportunities provided.
3
Literature Review
Monitoring of recent research and publications related to this topic indicates that a large number of works by both foreign and Ukrainian experts is devoted to handling and safety of the vessel under the influence of currents, wind and turbulence [9]. Moreover, the vast majority of authors are experienced seafarers who reflect in their work personal experience in managing ships in extreme conditions. Thus, in the work of authors [17] the evaluation and construction of structural modular mathematical model to forecast ship’s maneuverability is performed. It was also performed for modeling real-time maneuvering depending on the influence of natural factors, among which one of the main is the wind. The work of author [15] is devoted to stability of movement and definition of criteria of controllability of the vessel at wind influence. One of the most interesting works in terms of a competency-based approach to the study of information technologies for ship navigators is the work of the following authors [10]. It is devoted to account of the effect of wind in the mathematical model of the ship in order to assess its impact on the maneuverability of the vessel. The work presents a method of constructing a mathematical model, describes the effect of wind on the parameters of the ship’s maneuvering, which makes it possible to practically quickly recalculate the parameters of the ship’s movement at variable wind readings.
98
T. Zaytseva et al.
The following problems are considered in dissertation [3] study of the controllability of the vessel due to the influence of wind; features of wind influence on the ship with different dynamic qualities and hull architecture; study of the influence of wind on the maneuverability of ships. The author emphasizes the importance of computational approaches that take into account the real conditions of navigation. He also concludes that the use of methods of simulation mathematical modeling in solving the problem of improving the accuracy of forecasting is vital to achieve high-performance of equipment and safety of navigation. Fundamentals of numerical methods for solving problems are published in scientific and educational publications of authors [2,14]. Modern applications of numerical methods are associated with the use of information technologies, so many scientists use a spreadsheet MS Excel as a computer environment for modeling, for example, authors [5,7]. After analyzing the publications in which the solution of this problem is initiated, the authors of the article propose a method of applying simulation models in solving professional problems of navigation using digital technologies.
4
The Statement of Basic Materials
A separate module of “Information technologies for ship navigators” course is devoted to a series of professionally-oriented tasks. This block is offered by the course program after the cadet has received or strengthened the skills of using MS Excel spreadsheets, knows how to develop a mathematical model on the physical content of the problem, understands the purpose of calculation algorithm, uses methods of using the built-in capabilities of these spreadsheets. The block of professionally oriented tasks includes such tasks as plotting the ship’s route, if the coordinates of its departure and arrival are known, determination of parameters of wave height dependence on wind speed and distance from shore, calculation of errors of onboard instruments and many other tasks. The cadet learns to perceive the calculated mathematical formulas as a guide to action, to build a sequence of calculations, to check their correctness in various ways, including based on the physical content of the results. In order to do laboratory work “Calculation of the true wind speed and direction during the movement of the vessel”, the cadet must not only formally perform calculations according to the proposed algorithm, but also design a simulation model of the technological process. Wind at sea is a source of change in the chosen management strategy of the vessel. Determining the impact of force, wind direction on the ship and obtaining effective recommendations for management decisions is the task of simulating this process. Taking into account the competency-based approach to the training of ship navigator, the cadet should be acquainted with theoretical foundations of the research topic before work. The main elements of the theory that are directly related to the calculations are the following [16]: 1. the influence of wind on the vessel is determined by its direction and force, shape and size of the wind area, values of draught, rolling and listing;
Solving Professionally-Oriented Problems in Maritime Industry
99
2. the action of wind within the course angles of 0–110◦ causes a loss of speed. But at large course angles and wind strength not more than 3–4 points action of wind causes some increase of speed; 3. the action of wind in the range of 30–120◦ is usually with drift and wind roll; 4. relative wind (the one which is observed) acts on a moving vessel. This wind is connected with true one by the following: Vi = Vy2 + V02 − 2Vy V0 cos(Vy + β0 )
(1)
Vi2 + Vy2 − V02 2Vi Vy
(2)
Yi = Yy + β0 + arccos
where Vi is the true wind speed, m/s; Vy is the relative wind speed, m/s; V0 is the rate of sailing, m/s; β0 is the angle of drift, deg; Yk is the relative wind angle; Yi is the true wind angle. Specific wind pressure is also calculated: ρ = 0.08W 2
(3)
where W is the wind speed, m/s. 5. Sudden changes in wind speed are called gusts, and especially strong ones – squalls. When squalls, the wind suddenly and briefly (up to several minutes) increases, often to a storm, and then weakens. The direction of the wind at the gust usually changes. 6. It is difficult to determine the exact effect of the wind on the ship. With a weak headwind, the ship loses little speed, and slightly increases it when such a wind is from the stern. At a strong wind the speed of the vessel decreases while a headwind, as well as wind astern. The reason for this is the action of swell that develops while the wind and increases the resistance of the vessel. 7. The headwind can reduce the speed of a vessel with a large sailing area by 40%. While side angles of wind, losses are in 1,5–2,0 times less than at a counter wind of the same force. 8. The wind also causes the vessel to deviate from the course. The magnitude of wind drift depends on the speed of the vessel, the lateral plane of its sail, strength and heading angle of the wind. 9. The wind not only changes the elements of the ship’s movement, but also affects its controllability. When ships are at full speed while sea crossing, the activity of the wind has little effect on their controllability. When mooring, moving in narrow straits and in the open sea at low speed, strong winds can cause significant deterioration, as well as loss of control of the vessel. 10. For some vessels, gusts of wind and stormy conditions are accompanied by significant dynamic deviations with an onboard sway, can lead to dangerous roll angles. It should be noted that with the increase of the size of vessels, the influence of wind on them weakens.
100
T. Zaytseva et al.
The presented items are only a part of the information provided to the cadet to understand the process of wind influence on the parameters of the vessel. But this information is enough for the cadet to start doing laboratory work, and most importantly, to experience the professional orientation of the tasks he performs. In addition, this experience will be very useful to him, both when performing calculations in other courses, course projects, and in future professional work. The transition from technical information and indicators of devices to practical results is carried out with the help of simulation, which is one of the most important tools of research. The stages of creating a working model can be represented by the following scheme (Fig. 1).
Fig. 1. Stages of creating a working model
In this work at the first stage the logical-mathematical model of influence of wind on movement of the vessel is checked, criteria of influence are defined, parameters (results of indications of onboard devices) are fixed. Then a block diagram (modeling algorithm) is created, according to the processes that affect the movement of the vessel, the trajectory is adjusted to take into account the influence of wind. The algorithm takes into account all possible scenarios depending on the superposition of indicators. In our case, the variable parameters are the speed and direction of the wind, as well as the speed of the vessel. The size, purpose of the vessel and sailing area are considered specified. Thus, consistency in the executing laboratory tasks by a cadet is following: – technical formulation of the task; – theoretical substantiation; – comparison of the technical formulation of the problem and its mathematical (simulation) model; – construction of the algorithm for model implementation (for example, in the form of a block diagram) taking into account all possible variants of technical problems; – calculations according to the algorithm (in MS Excel spreadsheets); – graphical study of the process with variable input data; – analysis of the obtained results; – analysis of opportunities for optimization of the technological process and justification of the choice of management decision. So, let’s set the task of assessing the impact of wind and high waves on the ship. There is a proven method of determining the initial data - the direction of
Solving Professionally-Oriented Problems in Maritime Industry
101
the wind and its strength, taking into account the shape and size of the sailing area of the vessel, the location of the sailing center, the values of draft, roll and trim. According to the definition [13], simulation modeling is a method of research in which we study a system that is replaced by the model, which describes with sufficient accuracy the real system. The constructed model describes the processes as they would take place in reality. Experiments are performed on this real system in order to obtain information about it. A simulation model is a logical-mathematical description of an object that can be used for computer experimentation to design, analyze, and evaluate system performance. That is the model analyzed in this practical problem. Determining the vectors of the direction of the wind speed in relation to the course of the vessel allows us to describe the interaction of the corresponding vectors in the form of mathematical relationship. The forces and torque acting on the vessel from the wind side depend on the wind speed and the vessel, its configuration, as well as the magnitude of the angle between the diametrical plane of the vessel and the direction of wind flow. Mathematically, this is expressed by the following formulas: FAX = 0.5CAX ρA VR2 SM FAY = 0.5CA Y ρA VR2 SCA MA Z =
(4)
0.5CA M ρA VR2 SCA L
where FAX , FAY , MAZ are the tangential, normal aerodynamic, forces and moment respectively; CAX , CAY , CAZ are the aerodynamic coefficients of forces and moment; ρA is the density of the air; VR is the observed wind speed; SM , SCA are the area of projection of the above-water body over midstation section and centerline area. In our case, the cadet must understand the calculation formula, be able to make a structured input to the table, make calculations and assess their accuracy. In addition, the total wind pressure on the vessel should be calculated; the magnitude of the rolling moment and some other indicators (Fig. 2). In the laboratory work “Calculation of true wind speed and direction during ship movement” a cadet, in addition to standard methods of work in MS Excel spreadsheets, uses such elements as creating lists, checking input values, conditional formatting, macros, data protection. In other words, the performance of navigator’s calculations is a complex work that covers the technical capabilities of spreadsheets, which are most often used in real-life conditions. And the skills acquired by the cadets will be useful to the future ship navigators in the performance of their professional duties. The following Figs. 3, 4 show fragments of laboratory work. The use of switches allows us to quickly change the input parameters (depending on the objective situation). It also allows to change in-parameters, apply corrections according to features of equipment, evaluate the results of calculating the true wind speed and direction during the movement of the vessel and make optimal management decisions to adjust the vessel and its speed.
102
T. Zaytseva et al.
Fig. 2. Mathematical model of wind influence on the ship
Fig. 3. Simulation model of wind influence on the ship
Fig. 4. Graphic interpretation of the result
Solving Professionally-Oriented Problems in Maritime Industry
5
103
Results of the Research
In order to substantiate the adequacy of the constructed model to actual calculations, an expert evaluation of the studied material was conducted. The method of expert evaluations is performed according to the following algorithm: – a group of experts checks the work of the simulation model, evaluates its professional direction; – a special questionnaire is being developed (in our case we used a questionnaire with 16 questions); – experts answer the questionnaire; – conclusions are made regarding the adequacy of the model to the real problem, the relevance of the proposed study and the use of the results in practice. The method of expert assessments is used to obtain quantitative assessments of qualitative characteristics, parameters and properties. 5-Point Likert Scale was used for that. Each expert examining the model answers the questions of the questionnaire independently of each other. This procedure allows us to get an objective analysis of the problem and develop possible ways to solve it [6]. Maritime professionals were involved as experts. Namely, Chief Officers or First Mates who work at academy as teachers or as master’s students of correspondence department. They all have long experience of work at sea. A total of 14 people were involved in the expert assessment. The creation of the expert group took into account such factors as competence, constructive thinking and sufficient experience of relevant professional activities. The purpose of expert evaluation is to establish the compliance of the performance indicators of the simulation method of reliability of the real situation solution. To confirm or reverse findings of the research the following stages of processing expert assessments were used: 1. 2. 3. 4.
Construction of weight coefficients of quality ranking. Parameterization of quality indicators. Conducting expert quality assessment. Research of adequacy of the examination results.
When evaluating the objects of study, experts usually have different opinions. In this regard, there is a need to quantify the degree of experts agreement. Obtaining a quantitative measure of agreement allows to interpret the reasons for differences of opinion more reasonably. Expert evaluation of the effectiveness of the method will be reliable only by mutual agreement of experts. It can be found out with the help of concordance method [12]. For the processing of individual evaluations of the experts, a table was constructed (see Table 1). It summarizes and organizes the objects based on the averaging of their estimates - that is, rank order method was used. Checking the correctness of the matrix based on the calculation of the checksum can be found below:
104
T. Zaytseva et al. Table 1. Table of ranks Sum of ranks
d
3.5 2
2
37.5
−815 6642.25
7.5 5
9
123.5
4.5
20.25
...
...
...
...
...
...
...
1
1
1
1
14
−105 11025
2
3
4
5
6
7
8
9
10
11
12
x1
2
3
3
4
2
3
2
3
4
2
2
x2
11
10
10
7
9
9
12
8
10
5
11
...
...
...
...
...
...
...
...
...
...
...
x16
1
1
11
1
1
1
1
1
1
1
Sum
136 136 136 136 136 136 136 136 136 136 136 136 136 136 1904
d2
14
Act/Exp 1
13
57202
(1 + 16)16 (1 + n)n = = 136 (5) 2 2 The sums of columns of a matrix are equal and have the value of checksum. These data indicate that the matrix is compiled correctly. Concordance coefficient is calculated according to the formula below: xij =
W =
12S m2 (n3 − n)
(6)
where m = 14 is the number of experts; n = 16 is the number of questions in the questionnaire; S = 57202 is the cumulative sum. Concordance coefficient can range from 0 to 1. It will be equal to 1 if all rankings of experts are the same. And it will be equal to 0 if all rankings of experts are different. Calculated according to formula (6), the coefficient W = 0.86. The value of it is closer to one, so we can assume that the answers of experts are coherent. In general, the concordance coefficient is a random variable. To assess the value of concordance coefficient evaluation it is necessary to know the frequency distribution for different values of the number of experts m and the number of objects n. Arrangement of frequencies for W with different values of m and n can be found using aggregate tables. When the number of objects n > 7, assessment of the significance of concordance coefficient can be taken by χ2 criterion. To assess the significance of the concordance coefficient, we calculate the Pearson’s chi-squared test. The calculated χ2 criterion is compared with the tabular value of the number of degrees of freedom k = n − 1 and for a specified significance level α = 0.05. Since the calculated χ2 is larger than the tabular value (24.99579), the concordance coefficient is not random variable. It can be concluded that the results make sense and proposed simulation model corresponds to a real phenomenon.
6
Conclusions
A professional ship navigator, a specialist with a high level of training, who applies for an officer’s position on the ship, needs to know not only all the intricacies of navigation and sailing directions, but also ship’s construction, able to
Solving Professionally-Oriented Problems in Maritime Industry
105
work in team and perform all other direct duties and tasks. Ship navigator is responsible for the safety of navigation, security of the ship, crew and cargo. This means that he must not only clearly follow the instructions of the shore services, which control the movement of the vessel, but also be able to analyze the current situation, if necessary, quickly make management decisions and predict the consequences of these decisions. One of the areas of “Information Technologies” course is the study of the method of mathematical (computer) modeling. The material of this course involves cadets to solve problems formulated in their subject area, related to the formalization, construction of mathematical models and the use of information technologies for further research. Such problems usually require a lot of time to be solved, a systematic approach to the development of tasks and are computer-intensive. In the process of working with information technologies, cadets practice skills of constructing simulation (computer) models, developing algorithms for solving, evaluating the results, experience a qualitatively new socially significant level of competence, develop professional qualities of personality. A significant number of navigation and engineering tasks are reduced to the solution of equations (inequalities), set of equations (set of inequalities), differential equations or systems, evaluation of the integrals describing objects or phenomena. Application of methods of mathematical (information) modeling, forecasting the results of decision-making in various fields of activity require specialists to have the appropriate mathematical apparatus. The purpose of the study was to test the hypothesis whether the curriculum of Information Technologies” course requires simulation modeling methodology and the use of Excel spreadsheets for the purposes of the models. The block of the simulation model of wind influence on the ship was designed to get the cadet interested in the study of this topic, to motivate him to train, analyze the problem and find ways to solve it. The evaluation of indicator of coherence between experts by the indicator framework is equally important issue for scientific perspective. To assess the generalized degree of agreement of opinions in all areas (factors, parameters), the concordance coefficient is used.
References 1. Site of Kherson State Maritime Academy. http://kma.ks.ua/ua 2. Andrunik, V., Visotska, V.A., Pasichnyk, V., Chirun, L., Chirun, L.: Numerical methods in computer sciences: a master book. Lviv: Noviy Svit-2000 (2018) 3. Aung, N.: Estimated study of the controllability and elements of seaworthiness of vessels under the influence of currents, wind and waves. Doctoral dissertation (2011). https://www.dissercat.com/content/ 4. Dmitriev, V., Grigoryan, V., Katenin, V.: Navigation and Pilotage: Tutorial. Akademkniga, Moscow (2004) 5. El-Gebeily, M., Yushau, B.: Linear system of equations, matrix inversion, and linear programming using MS excel. Int. J. Math. Educ. Sci. Technol. 52, 83–94 (2008). https://doi.org/10.1080/00207390600741710
106
T. Zaytseva et al.
6. Kravtsov, H.: Methods and technologies for the quality monitoring of electronic educational resources. In: CEUR-WS.org, vol. 1356, pp. 311–325 (2015). http:// ceur-ws.org/Vol-1356/ 7. Kravtsova, L., Zaytseva, T., Puliaieva, A.: Choice of ship management strategy based on wind wave forecasting. In: CEUR-WS.org, vol. 2732, pp. 839–853 (2016). http://ceur-ws.org/Vol-2732/ 8. Kuznetsov, Y., Bushuyev, P.: Hydrometeorological support of navigation: Methodical recommendations for conducting practical classes in the discipline. HDMA, Kherson (2019) 9. Martelli, M., Viviani, M., Altosole, M., Figari, M., Vignolo, S.: Numerical analysis of the ship propulsion control system effect on the maneuverability characteristics in model and full scale. Proc. Inst. Mech. Eng. Part M J. Eng. Maritime Environ. 228(4), 373–397 (2014). https://doi.org/10.1177/2F1475090214544181 10. Martyuk, G., Yudina, Y., Yudina, A.: Consideration of wind in the mathematical model of the ship in order to assess its impact on maneuvering characteristics. Vestnik MSTU 7(3), 375–380 (2004). http://vestnik.mstu.edu.ru/v07 3 n18/ 11. Matokhin, A.: A systematic approach to risk analysis when maneuvering tankers in port waters. (2016). https://www.dissercat.com/content/sistemnyi-podkhod-kanalizu-riskov-pri-manevrirovanii-tankerov-v-portovykh-vodakh 12. Prokhorov, Y., Frolov, V.: Management Decisions: Tutorial, 2nd edn. St. Petersburg State University ITMO, St. Petersburg (2011). http://books.ifmo.ru/book/ 664/book 664.htm 13. Ryzhikov, Y.I.: Simulation Modeling. Theory and Technology. Moscow, Altex (2004) 14. Semerikov, S., Striuk, Y., Striuk, L., Striuk, M., Shalatska, H.: Sustainability in software engineering education: a case of general professional competencies. In: E3S Web of Conferences, vol. 166, pp. 10036–10048 (2020). https://doi.org/10. 1051/e3sconf/202016610036 15. Sobolev, G.: Stability of movement and criteria of controllability of the vessel at a wind. Works LKI 115, 77–85 (1977) 16. Vagushchenko, L., Vagushchenko, A., Aleksishin, A.: Criterion of the effectiveness of divergence maneuvers. Navigation 21, 51–57 (2012). http://nav-eks.org. ua/Nayka-na-site/Kriterii-Effektivnosti.pdf 17. Zhang, X., Xiong, W., Xiang, X., Wang, Z.: Real-time simulation of a rescue ship maneuvering in short-crested irregular waves. IEEE Access 3–9 (2017). https:// doi.org/10.1109/ACCESS.2019.2941591
Data Mining Methods, Models and Solutions for Big Data Cases in Telecommunication Industry Nataliia Kuznietsova1(B) , Peter Bidyuk1 , and Maryna Kuznietsova2 1
Institute for Applied System Analysis of the National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine {natalia-kpi,pbidyuke 00}@ukr.net 2 Institute for Information Recording of the National Academy of Sciences of Ukraine, Kyiv, Ukraine [email protected]
Abstract. This paper is concentrated on the applications of main data mining tools, methods, models and technologies for solving the basic tasks of data processing for telecommunication industry. The main forecasting task in this industry is to predict the class and volume of services needed for the subscribers as well as for predicting the capacity of all needed engineering equipment. It is proposed to develop regression and forecasting models based upon the Facebook Prophet module to take into account seasonal effects in data and to compare the results with the ones received by using the LSTM network. The classification task was related to the problem of churn prediction. The authors identified such promising methods as decision trees, random forest, logistic regression, neural networks, support vector machines and gradient boosting to solve problems of subscribers classification by their certain preferences and services, as well as the tendency to outflow. The dynamic approach based on dynamic models of survival theory for churn time prediction is proposed. Next the task of forecasting the volume and class of services which subscribers are going to use in roaming is solved. The results of using regression models and data mining methods were also shown. All the methods proposed were compared and evaluated by necessary statistical characteristics, and interpretation of the model application results for practical solutions were proposed. Finally in the paper the clientservice information technology with all needed functionality for solving all these tasks is proposed. Keywords: Big Data · Data mining · Churn prediction · Regression models · Facebook Prophet · Gradient boosting · LSTM network · Dynamic risk assessment · Mobile Internet · Telecommunication company
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 107–127, 2022. https://doi.org/10.1007/978-3-030-82014-5_8
108
1
N. Kuznietsova et al.
Introduction
Forced quarantine in 2020–2021, and the necessity for remote work with modern information tools has allowed many people to feel independent and learn to work directly from home using available modern information technologies. From the first days of lockdown there was a significant increase in the load on telecommunications networks, increase in usage of special services for audio and video presentations, usage of additional programs for conferences, meetings, discussions, most online payments, and orders from online stores. The increase in subscriber’s traffic in Ukraine has led to necessity for competing mobile companies in installing new and upgrading existing equipment, connecting more and more regions to the 4G standard. According to the information from Ministry of Digital Transformation of Ukraine in February 2021 the mobile operators connected 1,789 country settlements to high-speed mobile Internet [1]. The project launched in July 2020 involved development of the new standard LTE 900, and allowed to connect more than 7.7 million Ukrainians to high-speed 4G Internet. Today it is observed a further increase in mobile communication usage instead of landlines, increasing the time and traffic of mobile Internet usage, online TV, the emergence of peak loads at certain hours. Supporting the uninterrupted services to mobile subscribers is becoming a priority in conditions of growing competition and challenges associated with increasing the usage of services. An image loss for operators who are unable to avoid sudden overloading and their equipment failure are extremely critical [16]. Every subscriber of a telecommunication company is “worth its weight in gold”. It becomes extremely necessary for operators to keep their subscriber, make him interested in new services, technologies, applications and services at the highest speeds in the Internet and fulfill all requests almost instantly. Therefore, the traffic amount forecasting for each user, user group, cell, region as well as the mobile towers and additional technical means installation for providing high quality services became relevant for operators.
2
Problem Statement
For the last several years the authors conducted research on current issues in the telecommunications industry. Since the subscriber’s base and the amount of subscriber information that operators need to store and process is quite large, there is a necessity to use special modern data mining tools for Big Data analysis. Among the main tasks are the subscribers preferences and services analysis; identification of subscribers and customer groups that are prone to churn; the customers churn probability assessment and dynamic assessment of financial risks associated with the subscribers’ outflow; classification of the basic subscribers types which are going to use services in roaming and forecasting the volume of such services; assessing and forecasting the load of the telecommunication equipment and identifying “vulnerabilities” that need urgent modernization.
Data Mining Methods, Models and Solutions
3
109
Literature Review
Application of the data mining techniques to solve the problem of assessing the client’s churn propensity has been considered in several papers. The use of decision tree and neural network methods for the Singaporean mobile operator was proposed by V. Umayaparvathi, K. Iyakutti in work [30]. The data of the wireless network service provider in paper [22] was investigated by the methods of logistic regression and neural networks. Forecasting the risk of churn for telecommunication company subscribers in USA was made in paper [3] by application K-nearest neighbor, logistic regression and bayesian networks algorithms. The Taiwanese mobile operator customer base was analyzed in [6] by making use of the decision tree method. In the study [14] it was suggested the Taiwanese telecommunication company initial data, which have a large number of characteristics, first were processed using PCA (principal component analysis). Then it was proposed to apply the methods of decision trees, neural networks, K-means clustering, as well as their combination to predict possible churn. The possible churn of Belgian company paper editions clients was analyzed using logistic regression by the authors in the paper [9]. Investigation and forecasting the clients outflow was also performed by authors of the study in [17]. To predict the load amount and mobile Internet usage the authors proposed to use time series analysis, neural networks and the new technology of regression models Facebook Prophet [2,5,24,25,27–29]. Since this task involves predicting the absolute value as well as traffic variation it is advisable to use regression models and models with variable volatility. The dynamic approach proposed by the authors [18,19] was used for predicting the possible period of churn. It allows for mobile operators to calculate the probability and predict the time in which the churn may occur.
4
Challenges for Telecommunication Companies and Main Features of the Subject Area
The services quality provided by the telecommunication companies is directly infused into subscribers. Mobile Internet quality, the possibility for carrying out videoconferences, the usage of social networks, usage of smartphones and mobile Internet to remote work from home are really important for clients now. So the important task is to predict the mobile traffic volume on the highest quality which will be required soon. Another solved task was the traffic volume prediction and detection the subscribers’ group prone to churn. The third task for the company is to predict the moment when the most probable churn can happen. It is just that moment that is needed for mobile operators to correct any shortcomings, modernize technology and provide special offers to subscribers who think about the mobile operator change. The fourth task of the research was the prediction the services which subscriber would use abroad based on the historical statistical data of its services in Ukraine. In such a logical order we will study the main challenges and tasks that are arising before the mobile operator.
110
5
N. Kuznietsova et al.
The Task of the Base Stations Mobile Traffic Prediction
Today telecommunication companies need to analyze mobile Internet traffic not only on the level of some technology (2G/3G/4G) to ensure an effective usage of established base stations and provision of the high speed Internet to their users. It is also needed to forecast the demand for a region, country or else basis station. The provision of such details gives an opportunity to provide a capacity reserve for particular region and to detect the margin of the installed equipment safety. It allows to mobile operators to determine the critical regions in order to make it necessary to improve the innovations in the future technologies of the base stations. LTE (Long Term Evolution) technology is the name of the mobile protocol for data transfer. The project is the evolution of UMTS technology and the standard of high-speed, mouthless data transmission distributed by the 3GPP group. The marketing name of the technology is 4G. All Ukrainian mobile operators are working on its development and demand. The main LTE advantage is the speed of data transmission which directly depends on the frequency range width and MIMO (Multiple Input – Multiple Output). This technology is used for reducing the quality through spatial time/frequency coding and/or beam forming and to increase the transmission rate when using spatial multiplexing [1,11]. According to mobile operators information the re-ownership of one base station becomes from 200 to 500 thousands UAH and can reach near 3 million UAH. So the task is to predict the mobile Internet traffic for the rewiring stations and sectors identification that do not have enough resources for the subscribers’ maintenance and need upgrading or download a new one. In this paper various data mining methods are used to predict the mobile traffic volume. It should be noted that data of 4G technology usage is available only after its launch in the summer of 2018 in major Ukrainian cities. 5.1
Regression Models
Here we used daily statistics that reflected the consumption of mobile Internet traffic from the beginning 2019 to March 2020 (presented in Fig. 1) for building all models. It is clear that the attempt to use the averaged method for forecasting based on statistical information does not allow to accurately predict the actual values of the load at the station; the time series are visually non-stationary (in the following stages it was confirmed by statistical tests for stationarity check [21]). The speed of spread and usage of mobile Internet in Ukraine corresponded to a certain trend. That’s why we chose models based on time series [2,26], we used autoregression models with integrated moving average (ARIMA) which allowed us to take into account the trend. The analysis of autocorrelation and partial autocorrelation functions for the series and residuals was performed to select the ARIMA model parameters. A lot of models were built and the ARIMA model (11, 2, 1) showed the best results. The high order of autoregression can be explained by the fact that it uses daily data on the mobile Internet traffic volume which have a longer time relationship with previous values. Data from the beginning of 2019 to the middle
Data Mining Methods, Models and Solutions
111
Fig. 1. The volume of consumption of mobile Internet 4G telecommunication company
of January 2020 were used for model training, and data sample till April 20, 2020 was used for testing. Model quality characteristics are as follows: RM SE = 7, 2; M SE = 51, 85; M AE = 5, 74; M AP E = 17, 94. It has been suggested that there exists certain seasonality in the input data related to working days and weekends, holidays, vacations etc. The seasonal model of SARIMA (20, 2, 1) (3, 1, 3)[5] has been constructed. The seasonal model allowed slightly improve the quality of forecasting, as evidenced by the following metrics: RM SE = 7, 07; M SE = 50, 02; M AE = 5, 69; M AP E = 18, 43. As can be seen from Fig. 1 the input data for the mobile Internet traffic usage have very significant variations that reflect changes in load and there are peak loads on the equipment of the telecommunications company that need to be predicted. Therefore, it was decided to construct models for the input series variance changing. It will make it possible to predict the mobile traffic variance and to calculate the “margin of safety” of the equipment at the possible network loads. Statistical tests indicated the presence in mobile Internet traffic data the effect of heteroscedasticity. The study was conducted to determine the orders of heteroscedastic models for the residuals aiming to selection of the best conditional variance model type. Several model structures were estimated with statistical correlation methods using the data available and tested with statistical tests. Finally the best model for residuals in the form of GARCH (1, 1) was selected for the possible further use. It spite of the low order and simple model structure here the following, quite acceptable, model characteristics were obtained: RM SE = 6, 35; M SE = 40, 32; M AE = 4, 86; M AP E = 12, 62. 5.2
Forecasting with the Prophet Module
Next it was decided to compare the regression models forecasts and to use the capabilities of the Facebook Prophet module [28]. This module was designed to analyze the web traffic of social network users [27,28]. Due to the similar nature of mobile Internet traffic data we expected that module should give good
112
N. Kuznietsova et al.
results in forecasting the mobile Internet volume. The advantage of the Prophet module is the speed of model setup and simplicity of forecasting. The Prophet module generated model is divided into three components: trend, seasonality and takes into account holiday periods. After performing the appropriate research for our data series, we can see that indeed, our data significantly depend on the day of the week, the load on the network becomes greater on weekends and holidays and falls on Mondays (see Fig. 2). With the return after the holidays there is an active usage of mobile traffic in September-October, when there was an active solution to work issues, more time is spent on returning to work. A lot of models of different structures were constructed using the module and the best model showed the following statistical characteristics: RM SE = 14, 06, M SE = 197, 70, M AE = 12, 21; M AP E = 41, 51. Obviously, the results of forecasting quality are not the best i.e. our expectations were not met. This can be explained by the fact that the sample length for the 4G usage is too small (only 2 years) but in the future this model will be more efficient.
Fig. 2. Independence of the consumed mobile traffic which is fixed by the Prophet model
5.3
Forecasting Using LSTM Neural Networks
It was necessary to perform the transformation of input data to use LSTM neural networks [5,24]. It is needed to create a sequence of data that consists of n historical values and m future. A sequence format that stores 5 historical data
Data Mining Methods, Models and Solutions
113
and 1 next value was chosen for mobile internet traffic data. The sample was divided into training and test and it was necessary to determine the optimal number of epochs to avoid retraining of the neural network and statistical characteristics deterioration. The results of forecasting using the LSTM network are presented in Fig. 3. The best model received the following statistical characteristics: RM SE = 6, 69, M SE = 44, 80, M AE = 5, 14; M AP E = 16, 32. Even visually the graph (Fig. 3) shows that the model is very close to the real values. It is a little bit more careful and therefore lower in absolute value and variance but it reflects the main trends in traffic consumption. We summarize the obtained forecast quality estimates for all models in the joint table for easy comparison and the best model selection (Table 1). Thus, from all the models constructed for the mobile Internet absolute values prediction the best one defined was the LSTM model. It allows tracking longer dependencies over time. This is the main advantage of this approach over others which allowed achieve better values for the statistical metrics hired. The next model in terms of quality is the seasonal model SARIMA (20,2,1) (3,1,3) [5] which also takes into account the specificity of data related to working days. The best accuracy of forecasting estimates was
Fig. 3. Graphical representation of the results (in red colour with bigger variance is the real data graph and in blue color - the result of LSTM network forecasting) Table 1. Comparative analysis of all models for mobile Internet traffic forecasting No Model
RMSE MSE
MAE MAPE
1
ARIMA (11,2,1)
7,2
51,85
5,74
17,94
2
SARIMA (20,2,1) (3,1,3) [5] 7,07
50,02
5,69
18,43
3
GARCH (1,1)
6,35
40,32
4,86
12,62
4
Prophet
14,06
197,70 12,21 41,51
5
LSTM
6,69
44,8
5,14
16,32
114
N. Kuznietsova et al.
shown by the model GARCH (1,1) for modeling residuals and prediction the mobile Internet traffic variance.
6
The Task of Analyzing and Forecasting the Subscribers Churn
According to the national mobile operators data the annual customers churn in the telecommunications industry is about 20%, which leads to significant financial losses. Attracting new subscribers for a mobile operator is much more expensive than retaining existing ones. Therefore, the relevant task is forecasting the individuals and groups of customers who are prone to outflow. This task is a classification task where subscribers are divided into those who continue to be served in the company and those who are prone to outflow. Analysis and forecasting the probability and the class of subscribers subject to outflow is carried out on the basis of available statistical information and machine learning methods (logistic regression, neural networks, support vector machines) [4,7, 8,10]. Subscribers who are at risk of churn are sent special offers: discounts, bonuses that increase the likelihood of subscribers retaining. It is necessary that the classifier precisely indicates these subscribers. Otherwise the mobile operator may incur losses by providing discounts to subscribers who do not plan outflow to competitors. At the same time such task as the subscriber database updating can be solved while in Ukraine most subscribers did not provide their own passport data when purchasing the tariff. Therefore there is a need to solve tasks of predicting the subscriber’s age, gender, propensity of consuming specific services. Also it is possible to single out the tasks of the segmentation of the subscriber base in order to find the user groups that are similar to each other: they have family or friendly relations, cooperate on the project, communicate with each other, form a group of interests and so on. It will be possible to offer them joint offers (“favorite number”, “family”, “music lovers”, “social networks”) thus attracting even more subscribers to use the services of this mobile operator. Based on the analysis of publications and research on the topic of user outflow such machine learning methods as logistic regression, neural networks, random forest, gradient boosting were identified. 6.1
Description of Input Data
The problem of subscribers churn forecasting was solved on the basis of actual input data of the Ukrainian telecommunication company, the name of which is not given due to the authors’ obligation not to disclose confidential information. The incoming sample consisted of 150 thousand subscribers and accordingly information about their activity in mobile network for 15 months (2014 year and the beginning of 2015) according to 84 characteristics [20]. The main variables are shown in Table 2. A sample of 96460 examples was obtained after the data pre-processing step. It was divided into three parts: training (60%), test (20%) and cross-validation sample (20%). The best model was selected based on the
Data Mining Methods, Models and Solutions
115
Table 2. Description of the main characteristics of the clients and their behavior in using mobile services Name of characteristic Characteristic’s description INCOMING
Quantity of incoming calls minutes
PSTN
Quantity of outgoing calls minutes to landline numbers
ALIEN
Quantity of outgoing calls minutes to mobile numbers of other operators
REGION
The quantity of outgoing calls minutes to mobile numbers of the same operator in the same region
AREA
Quantity of outgoing calls minutes to mobile numbers of the same operator in another region
OMO MINS
Quantity of outgoing calls minutes to mobile numbers of other mobile operators
ONNET MINS
Quantity of outgoing calls minutes to mobile numbers within the network
INTERN MINS
Quantity of outgoing calls minutes to international numbers
GPR USG MB
Quantity of consumed Internet traffic megabytes
SMS
Quantity of sent SMS
Oblast Activated
Subscriber’s activation date
Gender
Gender of subscriber
AGE
Age of subscriber
COMPANY
Indicator of whether the subscriber is a corporate client
TYPE
Device model (mobile phone, tablet, etc.)
results of predictions from the test sample; cross-validation sample was used for independent evaluation of models. 6.2
Model Training
The sample available was randomly mixed into three parts (60%, 20%, 20%) ten times, after each time the models were trained and then tested on the test sample. Moreover, for the built models based on logistic regression and neural network the characteristics of the obtained sample were scaled to the interval [0; 1]. The simulation results are shown in Table 3.
116
N. Kuznietsova et al. Table 3. Comparison of models in the test sample
6.3
Model
Precision Recall F1
F0,5 GINI
Gradient Boosting
0,70
0,64
0,66 0,68 0,684
Random Forest
0,72
0,60
0,65 0,69 0,664
Neural Network
0,69
0,59
0,63 0,66 0,65
Logistic Regression 0,63
0,37
0,46 0,55 0,684
Interpretation of the Results
The best results were shown by gradient boosting over decision trees, similar to the results generated by the random forest method [20]. Overall accuracy is also an important metric in the task of subscriber churn. Predicting the fact of the subscriber outflow is done in order to offer the customer more favorable conditions, bonuses and thus keep him with current operator. Accuracy shows how successful such offers are for the subscriber. With a low classification accuracy the mobile operator will often send the offers to customers who do not plan to move from one operator to another, which can lead to additional financial loss. 6.4
The Characteristics Importance
The next stage of the study was to identify the characteristics that most affect the probability of the subscriber churn. Now consider the importance of all 84 characteristics for input data sample on example of a classification model based on gradient boosting over decision trees. The most significant were such characteristics as the activation date, the quantity of incoming minutes per month, the subscriber age, Internet traffic amount consumed per month. For several characteristics (INCOMING, SMS, GPRS, OTHER), the time series estimates, such as mathematical expectation, variance, skewness that were added to the sample are almost the same and sometimes even more important than the initial activity indicators. It is an important fact that the application of such additional time series characteristics gives better estimates of the forecasting quality. In the initial data sample this can be explained by looking at the matrix of the initial indicators of subscriber activity for 3 consecutive months of covariance estimates (Fig. 4). Indeed, the same activity indicators for different months correlate with each other, what confirms our idea for the similarity of the subscriber’s behavior in the following months to his behavior in previous months. This conclusion about the behavior of subscribers is generalized to solve the problem of predicting the period of possible subscribers churn, as well as the choice of services that are most typically used by the subscriber, including roaming. The most correlated indicators for all months were as follows: INTERNATIONAL (0.85 average covariance for all months), REGION (0.83), INCOMING (0.82), OTHER (0.77), SMS (0.75). Some characteristics (type of subscriber communication device, subscriber gender, some indicators of outgoing calls) did not have a significant impact on the results generated by the classifier.
Data Mining Methods, Models and Solutions
117
Fig. 4. Covariance matrix of subscriber’s activity indicators
6.5
Clustering
Next, the study was performed on how the customer’s churn is related to the subscriber assignment into a particular group (cluster) and how the classification accuracy will depend on how the input sample was divided into clusters. Such information will provide an opportunity to develop targeted marketing proposals not only for individual subscribers, but also for the group of subscribers most prone to churn. The following characteristics were selected for clustering: 1. Gender (Male, Female, Entity). 2. Date of activation (more than 2 years, more than 1 year, more than 3 months, less than 3 months). 3. Uses only Internet (phone, modem). As a result the sample was divided according to these characteristics into 24 clusters and in each of them was trained an independent classifier – gradient boosting over decision trees. At the previous stage gradient boosting showed the highest accuracy of the subscribers churn forecasting. The results of the clustering on individual subscriber clusters (groups) are shown in Table 4. The best classification result was obtained for the cluster “less than 3 months, telephone, legal entity”. The worst result was shown by the classifier in the cluster “less than 3 months, modem, male, gender”. The total results obtained for all metrics: the classifier, based on clustering of characteristics, showed higher results than previously considered models of machine learning. This result once again confirms the feasibility of such clustering, because in different clusters, the same
118
N. Kuznietsova et al.
characteristics and services have different importance, in other words: in different clusters, different reasons for the outflow of subscribers. A generalized comparison table for all the methods used is shown in Table 5. Table 4. The results of classification into clusters Date
Internet Man Woman Entity Prec. Recall AUC Prec. Recall AUC Prec. Recall AUC
3 mon. Phone 0,73 Modem 0,66
0,52 0,52
0,84 0,75
0,76 0,66
0,58 0,66
0,87 0,83
0,59 0,70
0,61 0,63
0,86 0,80
>1 year
Phone 0,63 Modem 0,80
0,60 0,53
0,88 0,77
0,78 0,74
0,65 0,74
0,86 0,84
0,75 0,63
0,69 0,70
0,85 0,79
>2 years Phone 0,83 Modem 0,66
0,52 0,53
0,92 0,85
0,86 0,72
0,57 0,71
0,94 0,85
0,86 0,84
0,60 0,64
0,92 0,92
Table 5. Comparative analysis of all the used methods
6.6
Model
Precision Recall F1
Set of classifiers
0,75
0,66
0,70 0,73 0,808 0, 904 ± 0, 004
F0,5 GINI AUC
Gradient Boosting
0,70
0,64
0,66 0,68 0,684 0, 842 ± 0, 005
Random Forest
0,72
0,60
0,65 0,69 0,664 0, 832 ± 0, 008
Neural Network
0,69
0,59
0,63 0,66 0,65
Logistic Regression 0,63
0,37
0,46 0,55 0,684 0, 842 ± 0, 002
0, 825 ± 0, 007
Forecasting the Moment of Subscribers Churn
The next important moment for the telecommunication company is the ability to develop the strategy to prevent the subscribers churn in a timely manner. It is necessary to predict the moment of time for possible customers churn and to determine the category of customers who have a tendency to change the operator to optimize the costs of the telecommunications company in particular to send promotional offers only to target customers. We use the dynamic approach proposed by the authors in [12,17–19] to estimate the time of possible customers churn. The dynamic approach involves the construction of different types of survival models. Initially modeling was performed using the nonparametric conditional distribution for individuals and selecting the best model using LogRank and Wilcoxon estimates. Such modeling is quite subjective and predicts risk but does not provide an answer on how to reduce it, i.e. which point in time is most
Data Mining Methods, Models and Solutions
119
critical, which characteristics may affect the outflow. Therefore, the study was continued using the dynamic method [18,19]. Survival models are constructed (in this context it is understood as the continuation of customer service) to predict the probability of continuing customer service depending on the type (corporate or private customer). In Fig. 5 one can track the potential customers churn in 1–2 months [18,19], when prepaid amounts, promotions ended or just temporary subscribers who bought numbers to use communication services in Ukraine while on vacation or on a business trip. It was also important to predict when the probability of customers outflow exceeds 50%. The algorithm for calculating the critical moment of time, proposed in [18], through a given 50% degree of risk was used. This critical moment for men occurs after 3 months of using the services, for corporate clients – after 4 months, and after 5 months – for women [17]. If at the initial stage it is determined that the telecommunication company considers an acceptable level of risk such that the level of risk function is not more λ(t1 ) = 0, 15 , critical is not more λ(t1 ) = 0, 3 , and catastrophically greater λ(t1 ) = 0, 4 [18], then the described time calculation algorithm can be determined for each execution (group) of customers whether there was a transition to a zone of critical or catastrophic risk. As can be seen from the Fig. 5 the highest level of losses was observed among groups of men and corporate clients after 1 month and 1.5 months.
Fig. 5. Graphs of survival functions for grouped data
Thus, the analyzed main subscriber’s characteristics allow the telecommunications company to make the conclusions and predict the scope and types of services that subscribers will use in future periods, to predict the likelihood of possible subscribers churn and take precautions. The information available to the mobile operator allows to make conclusions about the subscriber’s behavior and preferences, as well as to try to transfer the existing correlation to the services that subscribers will use in roaming. Therefore, the next task that we
120
N. Kuznietsova et al.
consider will be forecasting of the service types and volumes that will be used by subscribers abroad.
7
The Task of Predicting Services in Roaming
Having information on the services that the subscriber uses in Ukraine it is necessary to predict which of the subscribers and which services will be used abroad. For the analysis and forecasting of subscribers using roaming services, a sample of 120,000 records was obtained – data from subscribers who went abroad from August 2017 till July 2018, i.e. for twelve consecutive months. It is necessary to predict, first of all, whether the subscriber traveling abroad will use communication services, as well as to determine which services (weekend minutes or mobile internet) the subscriber will use. Similar to the task of forecasting the mobile traffic volume in Ukraine described above, the volume of voice calls and Internet traffic, which will be used by the subscriber abroad, was predicted using regression models. The best models for both types of services were ARIMA models. However, for the mobile operator the task of forecasting the volume of services abroad is additional to the development of special package offers in roaming. Here the task is to classify subscribers who do not use foreign services at all (and this is still a significant proportion of subscribers), and those who use the services and in which way. It could be developed several packages and offers to stimulate subscribers to spend money and use more convenient for them services of calls and mobile internet abroad. Here the task is to predict the following classes of services: – outgoing minutes in roaming; – mobile data internet. Table 6. Input variables and their description Name of characteristic
Characteristic’s description
MONTH ID ROAM
The month in which the subscriber went abroad
TARIFF NAME
Internal subscriber tariff plan
COUNTRYCODE
Tariff group code of the country to which the subscriber left
ROAMING COUNTRY The name of the country to which the subscriber left MONTH ID TRAFF
Previous month, taken to reflect the use of services in Ukraine
MO UKR
Amount of outgoing minutes used in Ukraine
SMS UKR
Quantity of short messages used in Ukraine
HOME GPRS USG
Amount of GPRS traffic used in Ukraine
AMOUNT
Amount spent by the subscriber to the services in Ukraine
PAYM HRN
Amount of account replenishment in the specified month
PAYM COUNT
Quantity of replenishments
OBLAST 90D
Administrative region of Ukraine, in which the subscriber stayed for more than 90 d, before leaving abroad
Data Mining Methods, Models and Solutions
121
The input data contain the characteristics listed in Table 6. In total, the sample contains 15 characteristics that describe the behavior of a subscriber in Ukraine and abroad. An experimental study was conducted and behavioral characteristics in Ukraine were generalized and selected for further modeling. The task is to consider the target variables package of services for voice calls and package of services for mobile internet in roaming to build classification models. The target variable Y calls for predicting outgoing calls in roaming can accept the following values: – – – –
gr gr gr gr
0 – the subscriber did not use the service; 10 – the subscriber used up to 10 min; 60 – the subscriber used from 10 to 60 min; over60 – the subscriber used more than 60 min.
The target variable Y mi for predicting GPRS traffic in roaming can take the following values: – – – –
gr gr gr gr
0 – the subscriber did not use the service; 100 – the subscriber used up to 100 MB; 500 – the subscriber used from 100 to 500 MB; over500 – the subscriber used more than 500 MB.
After preliminary data processing the data sample was divided into training (75%) and test (25%). While the data sample is unbalanced, the class stratification method was used to ensure that sufficient values were obtained from each group. Next, the sample was balanced by increasing the number of instances of smaller classes to the number of the majority class, increasing their weight for the training sample. A modified method of over-sampling – SMOTE (Synthetic Minority Over-sampling Technique) [23] was used. Here the records are not simply duplicated, but artificially generated based on examples of real class representatives with minor deviations. Solving the task of multilable classification [13] of services was made by such methods as logistic regression, neural networks, random forest, gradient boosting. The results of the main classification models comparison of service packages that the subscriber will use are given in the following tables: Table 7 and Table 8. The gradient boosting method was the best classification method for both types of services. It was possible to identify quite accurately the subscribers who will use the services abroad based on it. These models help the telecommunication company qualitatively to form and determine the target audience. The decision to provide special offers or inform subscribers is made by marketing department depending on the available budget.
122
N. Kuznietsova et al.
Table 7. Classification models quality for usage of outgoing calls in roaming Class
Method
Precision Recall F1
gr 0
Logistic Regression Neural Network Random Forest Gradient Boosting
0,83 0,86 0,82 0,86
0,67 0,78 0,93 0,87
0,74 0,82 0,87 0,86
gr 10
Logistic regression Neural network Random Forest Gradient boosting
0,43 0,56 0,71 0,68
0,45 0,64 0,56 0,63
0,44 0,60 0,63 0,65
gr 60
Logistic regression Neural network Random Forest Gradient boosting
0,22 0,40 0,70 0,57
0,43 0,52 0,40 0,62
0,29 0,46 0,51 0,59
gr over60 Logistic regression Neural network Random Forest Gradient boosting
0,07 0,35 0,59 0,47
0,70 0,38 0,18 0,63
0,12 0,37 0,27 0,54
Table 8. Classification models quality for usage of the GPRS service in roaming Class
Model
Precision Recall F1
gr 0
Logistic Regression Neural Network Random Forest Gradient Boosting
0,68 0,78 0,78 0,76
0,79 0,78 0,90 0,89
0,73 0,78 0,83 0,82
gr 100
Logistic Regression Neural Network Random Forest Gradient Boosting
0,62 0,66 0,72 0,74
0,34 0,61 0,67 0,60
0,44 0,64 0,69 0,66
gr 500
Logistic Regression Neural Network Random Forest Gradient Boosting
0,36 0,50 0,62 0,63
0,37 0,56 0,56 0,60
0,36 0,53 0,59 0,61
gr over500 Logistic Regression Neural Network Random Forest Gradient Boosting
0,29 0,46 0,63 0,53
0,69 0,49 0,38 0,59
0,41 0,47 0,48 0,56
Data Mining Methods, Models and Solutions
8
123
The Information Technology for Big Data Analysis of the Telecommunication Company
Ukrainian mobile operators collect an extremely large amount of data about the subscriber. Here we are talking not only about personal data (gender, age), but also the location of the subscriber, his typical place and schedule of work/study, location and time of rest, travelling abroad, the most contact subscribers, groups, mobile applications, methods of replenishment, mobile internet usage, communication quality, devices/smartphones, etc. This is a huge database of subscribers. A wide range of machine learning methods is involved for processing this database. In order to solve each individual task facing each department, it is necessary to develop and use special information technologies and tools. Consider in some details the structure of information technology (Fig. 6).
Fig. 6. Client-server architecture of a telecommunication company information technology
8.1
Client-Server-Architecture for Data Processing and Risks Assessment for Telecommunication Company
Today advanced IT (Fig. 6) includes in its applications the main methods and models, data mining methods implementation for analysis of non-stationary processes of arbitrary nature based on system analysis methods, provides and implements the possibility of hierarchical analysis, modeling and forecasting. IT takes into account structural, parametric and statistical uncertainties [15]. Since the telecommunication company faces a significant number of tasks every day that need to be automated, it was proposed to make them separate
124
N. Kuznietsova et al.
applications. Specialists of the relevant departments have access to the data and receive results of these applications. They analyze the services consumption by subscribers, forecast the possible churn, develop marketing proposals, form the new tariff plans. In fact, all these applications use the basic subscriber database which contains detailed information not only about the subscriber, its behavior and services usage, but also a daily report on calls and services, location, subscribers to whom the call was made. All this information should be aggregated and collected in accordance with the required details: daily, weekly, monthly, quarterly, etc. In the proposed information technology based on the application server were implemented the following basic applications: – preliminary processing and preparation for the analysis of customer data, their behavior and use of services (where, when and how); – models development for the customers churn prediction; – development of probability-statistical models that allow predicting the probability (churn, response to the offer, order and use of a particular service, etc.); – assessment of the quality of models and forecasts and the best model selection for the appropriate problem statement; – development of dynamic behavioral models that allow predicting the dynamics of customer behavior and the service usage duration; – dynamic assessment and forecasting of the period of possible churn; – development of models for forecasting the services volume that the client will use in roaming; – development of classification models of the corresponding tariff packages in roaming; – monitoring the usage of basic services in Ukraine: calls and mobile Internet (duration, time, peak loads); – use of additional services and reports generation on used packages, services (sending warnings about approaching services exhaustion, funds expiration, validity period); – other applications. The proposed information technology ensures the successful implementation of the necessary functionality of the developed algorithms and methods in the form of separate software applications which are called when necessary and provide effective support for the telecommunication company business activities.
9
Conclusions
According to available statistical information, the number of mobile users in Ukraine, i.e. those who have a mobile phone number, is more than twice the active population of Ukraine. Subscribers choose for themselves the operator that provides the most advantageous offers. However, the users often have their own financial mobile number, which the subscriber uses for certain financial services
Data Mining Methods, Models and Solutions
125
(delivery, ordering goods, loyalty programs), for business. Therefore even if the user does not make outgoing calls and does not pay a monthly subscription fee, it remains the mobile operator client. It is clear that the operator is interested not only in keeping this customer from churn, but also in making him an active user by subscribing to new services, encouraging the use of this mobile number as the main one, paying monthly tariffs. The tasks considered in this paper are the most relevant for a telecommunication company. Providing constant financial income and maintaining its own subscriber database is a source of success and financial stability. Providing quality services, especially mobile Internet, providing high speeds and even distribution to base stations is a guarantee of stability and customer satisfaction, as well as the ability to attract new subscribers. The task of mobile traffic forecasting is relevant for telecommunications companies for several reasons. Firstly, the Internet traffic volume forecasting allows improve understanding how the network will develop in the future. Secondly, it will use resources more efficiently to build and optimize base stations, avoid situations when there is an accumulation of subscriber traffic, but the network resources are not enough. Modern regression models such as ARIMA, GARCH, Facebook Prophet models and LSTM neural network were proposed for the use and studied. They provided adequate results of Internet traffic forecasting. The development of roaming services and encouraging subscribers to use the wide possibilities of the Internet and calls abroad is the task of keeping a subscriber from temporary churn and get the opportunity to receive additional revenues through well-thought-out tariffs and agreements for the provision of services abroad. A subscriber who continues to use the services of its own operator in any part of the world is a stable source of revenue for a mobile operator. Therefore, application of a modern artificial intelligence and machine learning methods and techniques to the processing and analysis of big data in specific mobile operators allows both to increase the profit of the operator and to improve the provision of relevant services and information development for the country as a whole.
References 1. Ministry of digital transformation of Ukraine (2021). https://thedigital.gov.ua/ news/mintsifra-bilshe-126-tis-ukraintsiv-otrimali-4g-vpershe-v-lyutomu 2. Beran, J.: Mathematical Foundations of Time Series Analysis, p. 309. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-74380-6 3. Brandusoiu, I., Toderean, G.: Predicting churn in mobile telecommunications industry. Electron. Telecommun. 51, 1–6 (2010) 4. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). https://doi.org/10. 1023/A:1010933404324 5. Brownlee, J.: How to develop LSTM models for time series forecasting (2018). https://machinelearningmastery.com/how-to-develop-lstm-modelsfor-time-series-forecasting/
126
N. Kuznietsova et al.
6. Chih-Ping, W., I-Tang, C.: Turning telecommunications call details to churn prediction: a data mining approach. Expert Syst. Appl. 23(1), 103–112 (2002). https://doi.org/10.1016/S09574174(02)000301 7. Chistyakov, S.: Random forests: an overview. Works of Karelian scientist center of RAS 1, 117–136 (2013). (in Russian) 8. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1007/BF00994018 9. Coussement, K., Poel, D.: Churn prediction in subscription services: an application of support vector machines while comparing two parameter-selection techniques. Expert Syst. Appl. 34, 313–327 (2008) 10. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001). http://www.jstor.org/stable/2699986 11. Hanzo, L., Akhtman, Y., Wang, L., Jiang, M.: MIMO-OFDM for LTE, WiFi and WiMAX: Coherent Versus Non-Coherent and Cooperative Turbo Transceivers, p. 692. Wiley-IEEE Press (2020) 12. Havrylovych M., Kuznietsova, N.: Survival analysis methods for churn prevention in telecommunications industry. In: CEUR Workshop Proceeding, vol. 2577, pp. 47–58. CEUR (2020). http://ceur-ws.org/Vol-2577/paper5.pdf 13. Herrera, F., Charte, F., Rivera, A., Jesus, M.: Multilabel Classification. Problem Analysis, Metrics and Techniques, p. 194. Springer, Cham (2016). https://doi.org/ 10.1007/978-3-319-41111-8 14. Hung, S., Yen, D., Wang, H.: Applying data mining to telecom churn management. Expert Syst. Appl. 31, 515–524 (2004) 15. Kuznietsova, N., Bidyuk, P.: Intelligence information technologies for financial data processing in risk management. In: Babichev, S., Peleshko, D., Vynokurova, O. (eds.) Data Stream Mining & Processing. Communications in Computer and Information Science, vol. 1158, pp. 539–558 (2020). https://doi.org/10.1007/978-3-03061656-4 36 16. Kuznietsova, N., Kuznietsova, M.: Data mining methods application for increasing the data storage systems fault-tolerance. In: 2020 IEEE 2nd International Conference on System Analysis & Intelligent Computing (SAIC), pp. 315–318 (2020). https://doi.org/10.1109/SAIC51296.2020.9239222 17. Kuznietsova, N.V.: Information technologies for clients’ database analysis and behaviour forecasting. In: CEUR Workshop Proceeding, vol. 2067, pp. 56–62. CEUR (2017). http://ceur-ws.org/Vol-2067/ 18. Kuznietsova, N.V.: Dynamic method of risk assessment in financial management system. Registration Storage Data Process. 21(3), 85–98 (2019). https://doi.org/ 10.35681/1560-9189.2019.21.3.183724 19. Kuznietsova, N.V., Bidyuk, P.I.: Dynamic modeling of financial risks. Inductive Model. Complex Syst. 9, 122–137 (2017). http://nbuv.gov.ua/UJRN/Imss 2017 9 15 20. Kuznietsova, N.V., Bidyuk, P.I.: Modeling of financial risk in the telecommunications field. Sci. News NTUU “KPI” 5, 51–58 (2017). https://doi.org/10.20535/ 1810-0546.2017.5.110338 21. Kwiatkowski, D., Phillips, P., Schmidt, P., Shin, Y.: Testing the null hypothesis of stationarity against the alternative of a unit root: how sure are we that economic time series are nonstationary? J. Econ. 54, 159–178 (1992) 22. Mozer, M.C., Wolniewicz, R., Grimes, D.B., Johnson, E., Kaushansky, H.: Predicting subscriber dissatisfaction and improving retention in the wireless telecommunications industry. IEEE Trans. Neural Networks 11(3), 690–696 (2000). https:// doi.org/10.1109/72.846740
Data Mining Methods, Models and Solutions
127
23. Nikulin, V.N., Kanishchev, I.S., Bagaev, I.: Methods of balancing and normalization of data to improve the quality of classification. Comput. Tools Educ. 3, 16–24 (2016) 24. Orac, R.: LSTM for time series prediction (2019). https://towardsdatascience.com/ lstm-for-time-series-prediction-de8aeb26f2ca 25. Osovsky, S.: Neural networks for information processing (translation in Russian by I.D. Rudinsky), p. 344. Finance and statistics (2002) 26. Papageorgiou, N.S., Radulescu, V.D., Repovs, D.D.: Nonlinear Analysis – Theory and Methods, p. 586. Springer, Cham (2019). https://doi.org/10.1007/978-3-03003430-6 27. Robson, W.: The math of Prophet (2019). https://medium.com/future-vision/themath-of-prophet-46864fa9c55a 28. Taylor, S., Letham, B.: Forecasting at scale. PeerJ Preprints 5, 1–25, e3190v2 (2017). https://doi.org/10.7287/peerj.preprints.3190v2 29. Tsay, R.: Analysis of Financial Time Series, p. 720. Wiley, New York (2010) 30. Umayaparvathi, V., Iyakutti, K.: Applications of data mining techniques in telecom churn prediction. Int. J. Comput. Appl. 42, 5–9 (2012). https://doi.org/10.5120/ 5814-8122
Agile Architectural Model for Development of Time-Series Forecasting as a Service Applications Illia Uzun1(B) , Ivan Lobachev2 , Luke Gall3 , and Vyacheslav Kharchenko4 1
4
Odessa Polytechnic State University, Odessa, Ukraine [email protected] 2 Intel Corporation, Hillsboro, USA [email protected] 3 Keen Foresight LTD, Vancouver, Canada [email protected] National Aerospace University “Kharkiv Aviation Institute”, Kharkiv, Ukraine [email protected] Abstract. Time-series data analysis and forecasting have become increasingly important due to its massive application and production. Working with time series, preparing and manipulating data, predicting future values, and forecasting results analysis are becoming more natural tasks in people’s everyday lives. Modelling of an architectural design for convenient and user-friendly applications that provide a range of functionality for collection, management, processing, analysis, and forecasting time-series data establishes this article’s goal. The system’s technical requirements are maintainability and the ability to expand it with new data manipulation methods and predictive models in the future (scalability). As a result of this paper an architectural model was built, fieldtesting and profiling of which determined that applications developed on its basis allow users to get the desired results faster than using alternative solutions. The reduction of required level of user technical skills was achieved by the presence of the front-end component. Possibility of precise time-series predictions provision regardless of the data domain was accomplished by creation of dynamically extendable predictive model management service and tools. Improvement of system maintenance and extension time-costs was the result of microservices architecture pattern usage. As a result, this work demonstrates an informational system for end-to-end workflow on time-series forecasting. Application implementation prototype (proof of concept) that was built on the basis of the described architectural model demonstrated the advantages of this design over existing analogues. During testing, scalability improvements and overall efficiency increase in terms of time and resource costs were recorded. Keywords: Time-series forecasting · Software-architecture · Scalability · Microservices · Big data · Data collection · Time series analysis
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 128–147, 2022. https://doi.org/10.1007/978-3-030-82014-5_9
Agile Architectural Model
1 1.1
129
Introduction Motivation
The impact of time-series data on people’s everyday lives has significantly increased in recent years. The reasons for this are the massive collection of such data, the release of new predictive-models with higher prediction precision and robustness, as well as the general evolution of data science. Consequently, it increased the people’s awareness of all of the benefits that their business or field of science can receive from collecting, analyzing, and predicting time-series data. Many software-development companies reflected on this and started their work on services, with the primary goal to provide necessary tooling to cover people’s needs. For example, Amazon has recently launched a time series forecasting service [35], and it is not the only company that did so. However, most of the available products still have many obstacles. Such solutions are often too complicated and are targeted at advanced users, datascientists, or developers, preventing ordinary users from receiving all of the systems benefits or even deploying or installing them. Most of these apps operate with a vast amount of tools and segregated features, blocking users from having experience working with their data in an end-to-end fashion. Many of them use only one standard machine learning model under the hood causing significant differences in prediction precision depending on time-series data characteristics. Lack of proper forecasting results analysis makes it impossible to localize issues and improve prediction results. This paper’s authors underline one root problem of such products - lack of architectural scalability. Therefore, it becomes necessary to study this problem and search for possible solutions. In that case, to provide an architectural design and the stack of technologies while describing an example of such a system that aims to cover all of the described problems. Data, in general, can be separated into three main categories: – Time-series data - a set of observations on the values a variable takes at different periods. – Cross-sectional data - one or more variables, collected at the same place in time. – Pooled data - a combination of time series data and cross-sectional data. Time series data is a topic of discussion in this article because of its historically defined importance of any data connected to time and its continuing growth in everyday lives. Despite that fact, working with time-series data also becomes more popular as people are getting more aware of the benefits that data prediction and analysis can bring to them and their businesses. A very relevant approach is to provide proper tooling for management, processing, analysis, and forecasting time-series data in one final application. With such a product, users can receive a smooth and stable end-to-end workflow with their data. Time series [13] data has its unique set of processing, analysis, and forecasting features that continue to be developed and invented by machine-learning engineers and data scientists. It is relevant to build a system capable of dynamic,
130
I. Uzun et al.
fast, and easy extension with new methods and predictive models. More information about time-series data, its main characteristics; as well as its collection, processing, analysis, and forecasting techniques, can be found in the Literature review below. The quality of time-series data prediction rises as new models become available. However, prediction precision has a strong dependency on data characteristics and models on which they were trained. So, the systems that are using only one predictive model for data with all sets of different characteristics are doomed to problems with prediction accuracy. The relevance of all these points is to build a service that can support more than one predictive model and data analysis service that can choose between them depending on data sent to it. Systems with a high level of maintainability and scalability are not new to the world of enterprise software solutions. These characteristics became requirements for many modern software products. There are many software architecture patterns that can help us, as developers, receive all benefits from dynamic systems; however, we will talk about a microservices architecture in this paper. The microservice architecture [5,6] style is a way to build a single system as a bundle of tiny services, each running its process and interacting with lightweight mechanisms. More information about microservices is provided in the Literature review section. There is also a set of libraries and frameworks developed to provide developers with all the needed tooling to build extendable and dynamic services which open API (Application Programming Interface) for external usage. For example, the Spring Framework [36] is an application framework and inversion of control container for the Java platform or Django [15] - a high-level Python Web framework that encourages rapid development and a clean, pragmatic design. Additional problems that often go together with the development of systems that work with big-data are storing, processing, and transferring. Although, there are many technologies developed to resolve such issues. For example, Apache Hadoop [39] is an open-source framework used to store and process large-scale data sets from terabytes to petabytes of data. Instead of using one powerful computer to store and process data, Hadoop allows developers to integrate multiple computers to analyze large data sets with the same speed. As another example, we can consider Apache Spark [8]. This data processing framework can quickly perform processing tasks on substantial data sets and distribute data processing tasks across multiple computers, either on its own or in tandem with other distributed computing tools. A set of databases is recommended for big-data storage in a structured way, such as Cassandra [10], MongoDB [21], Hbase [9], and others. In the current article, we did not consider relational databases because dynamic scaling of nonrelational databases has more advantages in working with time-series data. We use MongoDB as our database system and Hadoop for big-data file storage in the described application. One of the most common issues in systems that operate with big-data is the time needed to perform any operations on large datasets. These operations can vary a fair bit - from simple data set normalization or data parsing into a
Agile Architectural Model
131
database to complex operations such as model training based on the aforementioned data set. It is inappropriate to force users to wait until some operation finishes before allowing them to continue their journey through the application - especially if it takes significant amounts of time to complete an operation. The Fire and Forget interaction model used in the described application is intended to run all of the heavy operations in the background while enabling users to continue performing other tasks. Under the hood, these operations can be performed asynchronously and return the operation’s status to keep users notified. 1.2
Problem Statement
Time-series data management tool is a desirable component for day-to-day business operations of different businesses. Proper time-series data manipulation, collection, analysis, processing, and forecasting tooling will help people to start receiving the benefits of their data. In the context of consistent development of new time series analysis, processing, and prediction methods and techniques, the new system should be scalable and be able to be extended by new features and functions quickly and with minimal impact to existing infrastructure. Hence, a problem which will be solved in the paper is the inaccessibility for simple users to benefits of time-series data forecasting. Article aims on solution the problems that are carried by static architectural approaches used in all existing public methods for time-series forecasting, such as impossibility to scale applications with new methods and maintain existing functionality without essential developers efforts and time-costs. Significant consumption of users’ time resources during collection, management, analysis, processing, and forecasting of time-series data with manual methods or by using products that are based on alternative architectures is the issue that is going to be solved in flow of this work. 1.3
Literature Review
Time series data [1,12] is a collection of quantities assembled over time and ordered chronologically. Data collection at different points makes time-series opposed to cross-sectional data, which observes some independent objects at a single point in time. Another feature that differentiates time-series data from cross-sectional data is the possibility of a correlation between observations in time-series data, where data points are collected at close intervals. In the terminology of computer science, time-series data can be presented as a map (dictionary) where time is a key with strict mapping to its value. Assuming that time-series data has equal time-intervals between points, it can appear as a onedimensional array with retained order. Time series data are historically associated with, and can be found in, economics, social sciences, finance, epidemiology, medicine, physical sciences, and many other fields and scientific branches. However, the list of areas where timeseries data is being collected and used expands exponentially. The period with which the data is collected is usually called the time series frequency. Time series data characteristics make its analysis different from other data types.
132
I. Uzun et al.
The most important characteristics are: The trend over time refers to increasing or decreasing values in a given time series. Seasonality refers to the repeating cycle over a specific period, such as a week or month - serial correlation between the following observations. The irregular component refers to the White Noise - a random variation not explained by any factor. The mathematical characteristics of time series data often violate conventional statistical methods’ assumptions. Because of this, time-series data analysis requires a unique set of tools and methods, collectively known as time series analysis. Time series analysis [12,26] is a technique that deals with time-series data or trend analysis. Time series analysis methods can be divided into domain frequency methods and time-domain methods. The first includes spectral analysis and wavelet analysis; the latter includes auto-correlation adjustment and crosscorrelation. Data preprocessing [25] is a technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and lacking in certain behaviors or trends and is likely to contain many errors. Data Preprocessing is a proven method of resolving such issues. There are four basic methods of Data Preprocessing. They are Data Cleaning, Data Integration, Data Transformation, and Data Reduction. Time series data forecasting [1,12,14,24] is an important area of machine learning and can be cast as a supervised learning problem. A wide range of time-series forecasting methods is often developed within specific disciplines for specific purposes. Each method has its properties, accuracies, and costs that must be considered when choosing a specific method. Machine Learning methods such as Regression, Neural Networks, Support Vector Machines, Random Forests, XGBoost, and others—can be applied. The appropriate forecasting methods depend largely on what data are available. Big data [22] refers to the large, diverse sets of information that grow at ever-increasing rates. It encompasses the volume of information, the velocity or speed at which it is created and collected, and the variety or scope of the covered data points. Big data often comes from data mining and arrives in multiple formats. Big data storage [32,37] is a storage infrastructure designed specifically to store, manage and retrieve massive amounts of data or big data. Big data storage enables the storage and sorting of big data so that it can easily be accessed, used, and processed by applications and services working on big data. Big data storage is also able to scale as required flexibly. Big data collection methods refer to calling some source of information for a long or “infinite” amount of time. The Source of information can be an external API or set of data-collection methods depending on the data-type. Data visualization [2] is the graphical representation of information and data. Using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. In the world of Big Data, data visualization tools and technologies are essential to analyze massive amounts of information and make data-driven decisions.
Agile Architectural Model
133
Architectural pattern [3,4,18,23] is a general, reusable solution to a commonly occurring problem in software architecture within a given context. Architectural patterns are often documented as software design patterns. There are many recognized architectural patterns and styles: Blackboard, Client-server, Component-based, Data-centric, Event-driven, Layered, Modelview-controller, Reactive architecture, Service-oriented, Microservices architecture, Monolithic application, and many others. Monolithic architecture [27] is built as a single and indivisible unit. Usually, such a solution comprises a clientside user interface, a server side-application, and a database. It is unified, and all the functions are managed and served in one place. Monolithic architecture has such strength as fewer cross-cutting concerns that affect the whole application, ease of debugging, testing, and deployment. On another hand, Monolithic Architecture weaknesses are: – It can become too complicated to understand. – It is harder to implement changes. – Disability to scale components independently. It is problematic to apply new technology in a monolithic application because then the entire application has to be rewritten. In Microservices architecture [5,6,27,31], the entire functionality is split up into independently deployable modules which communicate with each other through defined methods called APIs. Each service covers its scope and can be updated, deployed, and scaled independently. Strengths of the Microservices Architecture [19] are as mentioned below: – – – – –
All the services can be deployed and updated independently. Microservice applications are easier to understand and manage. Each element can be scaled independently. Flexibility in choosing the technology. Any fault in a microservices application affects only a particular service and not the whole solution.
Weaknesses of the Microservices Architecture are as mentioned below: – – – –
Extra complexity System distribution and deployment Several cross-cutting concerns. Testing.
Scalability [11] measures a system’s ability to increase or decrease in performance and cost in response to changes in application and system processing demands. Examples would include how well a system performs when the number of users is increased, how well the database withstands growing numbers of queries. In software - scalability also determines the system’s ability to be quickly and easily extended with new features without making any changes. Scalability is also one of the SOLID [29,30] principles.
134
I. Uzun et al.
Maintainability [20] is defined as the probability of performing a successful repair action within a given time. In other words, maintainability measures the ease and speed with which a system can be restored to operational status after a failure occurs. Fire and Forget interaction [40] model behavior can be summarized as follows - an interaction where a user shares “multiple intents” with a Digital Assistant. The results of “actions” generated to meet the user intent(s) will be presented across relevant channels as and when they are completed. Another approach to handle long-running operations in user-friendly was is by allowing users to manually trigger refreshes and notify them when the new data is ready. In terms of software engineering, this can be presented as a set of operations that are processed asynchronously in the background, continuously providing the user’s current state of the task. Such big operations can be datacollection, data-processing, model-training, and others. 1.4
The Paper Goal and Structure
The primary goal of the research is to find an architectural design which can be used for development of extendable and maintainable systems for collection, processing, management analysis and predictions o time-series data. The desired design should reduce time-costs needed for end-to-end workflow on data forecasting. To describe practical benefits of using agile architectural model for tasks related to time-series the goal is also to provide an example of time-seriesforecasting-as-a-service application based on provided pattern and compare to manual efforts and market competitors. The paper is structured as a follows. Section 2 of the article describes the history of the project, the problems that were faced and why and how the final architectural model was designed. The end architectural design, general schema of services in it and their dependencies are presented as a result of this Sect. 2. Section 3 provides detailed view on each microservice of the model with a list of responsibilities for each service, proposition for technologies stack taken from prototypical application and description of communications with other services. Section consists from 5 subsections for each microservice in the architectural design. In Sect. 4 of the paper authors compare applications built on the basis of provided architectural design with methods of manual workflow on time-series forecasting and with existing services available online. Section 5 concludes the paper and describes directions of future research and development.
2
Architecture Design
The history of the project described in this article started with a monolithic design. The system that was built initially, contained everything needed for the end-to-end workflow with time-series data. It was possible to process data, analyze it, and eventually pass it through a neural network to receive some forecasting results. It was possible to test a trained model with different data and data-parameters. Grid-search tooling for searching best-fitted data parameters
Agile Architectural Model
135
helped during the training of new models. The predictions and analysis results were satisfying, and it was possible to receive benefits from future predicted values. However, we faced significant problems after the product started to grow. The main issue was that the architectural approach was too static and complex. As time-series data processing, analysis, and model training methods evolve at a tremendous rate, there was always a need to add new techniques and methods. Striving for it would enlarge the codebase, making it even more complex and making it even harder to support. With the lack of a proper user interface, it was almost impossible for users without technical skills to use our application. Lack of proper data storage made it impossible to work with multiple datasets at a time. The solution for the above-listed problems was to redesign the whole product into a microservice architecture-based system [27]. The initial plan was to split the application into small services with a set of responsibilities attached to each. The main feature of the whole system was time-series data forecasting. The user’s ability to receive benefits from different types of time-series data prediction was essential and model-management services intended to be the application’s core. This service’s main responsibility was to train the model on the data sent to it. The required feature was the model testing functionality to test the trained model on its data. The ability to have proper training and testing results presentation was needed, so the users could receive graphical and statistical information about the model quality and get the list of predicted values that they can use afterwards. The model was intended to run with different parameters. This set of parameters had to be stored in service, so the users (internal or external) would not have to reenter all of them for each model training or testing interaction. The main technical requirement for model-management service was proper scalability. As long as new predictions were being developed, our system had to have an opportunity to add and test them quickly and easily. As explained in the Literature section, time-series data has its unique set of characteristics, making its analysis significantly different from other data types. The dependency between predictive data and data features is high. There is still no predictive model available to return satisfying prediction results for a dataset with all different levels of trend, seasonality, serial correlation, or irregular component. For example, some models can be better at predicting seasoned data but worse on datasets with a high regulation level, but other models can be better on noised data and worse on trended data. Dataset-analysis service was intended to get insights information about data characteristics and decide what predictive model to use. In the legacy (monolithic) system, we had a large amount of dataset preprocessing techniques and data normalization methods, smoothing, and differentiation. The general use case of dataset processing is described in the theory section. It is essential to have the ability to add new data processing functions to the system dynamically, so a dataset-processing service was developed. Its main responsibility is to build and store data processing pipelines (set of processing
136
I. Uzun et al.
methods that will be executed one after another during dataset processing), choose pipeline, and pass the data through its returning processed dataset. The source of time-series data can vary depending on its field of usage. For example, many companies collect time-series information about their employees or customers, some internet of things systems are collecting time series about the devices, and so on. The users themselves define the general source of information. However, there are also a set of general and common data sources that can be collected. Such information can be collected from some external APIs, for example, stock prices information or data from weather forecasting websites. To help users collect some general-purpose time-series, a dataset-collector service was created. The main requirement is to give common users the ability to collect their datasets using “tickers” that collect data from specified API and with userdefined frequency. Dataset storage and management functionality was almost the first required component in the system. Time-series data collected for a long enough period can be considered a big-data dataset, so we had to develop an approach to receive, store, and work with such data with the whole system’s retained stability. Dataset management service was developed, and was mainly responsible for receiving data from the user or dataset-collector service, persisting it, and sending data to other services - such as dataset-processing service, dataset-analysis model-management service. This service had to be the one from which the user begins his journey through the entire application. The technologies stack and external services dependencies of each microservice had to be chosen individually to suit requirements independently. For instance, dataset-management is built fully using Java programming language [28] and Spring framework, when other services are using Python and Django framework on their bases. In the current paper, we will consider the front-end as a black-box. We know about its existence, but we will not cover the description of front-end side services and their architecture. However, all the back-end functions are directed to work with user interface. A large amount of data made it necessary to dive deeply into streaming protocols analysis [34,38] choose the best-suited set for big-data transportation. Our systems’ services are communication via HTTPS (HyperText Transfer Protocol Secure) and MQTP (Message Queuing Telemetry Transport) for binary data. Such technology options as RSocket [33] and Apache Kafka [7] were considered. For ease of deployment, all services had to be Dockerized [17]. Each service had to provide an independent external API to retain the loose coupling between services. Interservice communication between services had to proceed via serviceregistry. Details on each service and inter-service communication between them can be found in the “Microservices and interservice communication overview” section. The general schema of services and their dependencies is presented in Fig. 1.
Agile Architectural Model
137
Fig. 1. General schema of services, their dependencies and inter-service communication
3
Microservices and Interservice Communication Overview
The overall system consists of five back-end microservices and a frontend. We will dive more deeply into internal logic, technology stack, external dependencies and set of responsibilities of each microservice in this topic. 3.1
Dataset-Processing Service
Main responsibilities: – Create, manage, modify and store dataset processing pipelines – Receive data and processing pipeline id via API to start processing, which will pass dataset through processing steps defined in the pipeline – Return processed dataset as a results
138
I. Uzun et al.
Dependencies and technology stack: Dataset-processing service is written in Python. Service is built on the base of the Django framework. Service opens API for other services using DjangoRestFramework [16]. Service uses MongoDB as storage for processing pipelines, and it is his only dependency. Communication with other services: Service API in the system is being used from two places: – Frontend can get the list of available processing methods and the list of prepared pipelines – Frontend sends requests for the creation and management of data-processing pipelines. In each request-id of the pipeline, the processing method’s id from the list of available methods and order of current processing step are sent. As a result, it provides a pipeline builder mechanism for adding, deleting, and reordering steps within a pipeline – Dataset-management service sends a dataset for processing on user’s demand as a one-dimensional array with the id of processing pipeline and retrieves the processed dataset. 3.2
Data-Collection Service
Main responsibilities: – Retrieve user requests for starting a new dataset collection task from a specified external API and with a specified frequency. – Collect data from external API and store collected data in CSV (CommaSeparated Values) format locally. – Send collected data into a data-management service on user’s demand. Dependencies and technology stack: Dataset-collection service is written in Python. Service is built on the base of the Django framework and opens API for other services using DjangoRestFramework. Service uses MongoDB as storage of each collector’s metadata. Communication with other services: – Frontend receives the list of available collectors, can send a request to delete or add new processing requests, and download collected data from the collector. – Push data to dataset-management service on user’s demand. 3.3
Dataset-Management Service
Main responsibilities: – Store user uploaded files (CSVs) in dataset-storage. – Store parsed datasets in the database (processed datasets as well as row datasets). – Deleting or restructuring of datasets. – Transferring data to other services on user’s demand.
Agile Architectural Model
139
Dependencies and technology stack: The Dataset management service is written on Java and uses Spring Framework for its base. Hadoop [39] infrastructure is used for file storing. Service uses MongoDB as storage for parsed data. Communication with other services: – Frontend manages uploads and manages files in dataset-storage. – Frontend sends a request to parse a dataset stored in dataset-storage and persist it into the database. – Frontend receives the list of CSV files stored in Hadoop and parsed datasets stored in MongoDB. – Sends data to dataset-processing service. – Receives data from dataset collection service. – Sends data to dataset-analysis service. – Sends data with training request id to model-management service to start training. – Sends data with testing request id to model-management service to start testing. 3.4
Model-Management Service
Main responsibilities: – – – –
Store model-training, model-testing requests, and grid-searching requests. Train and test model on received data. Run grid-searching to find best-fitted parameters for the model. Store results of training and testing.
Dependencies and technology stack: Dataset-processing service is written in Python. Service is built on the base of the Django framework and opens API for other services using DjangoRestFramework. Service uses MongoDB as storage for training, testing, and grid-search requests. Communication with other services: – Frontend creates, deletes, and modifies training, testing, and grid-searching requests. – Frontend calls API to get training, testing, or grid-search results. – Dataset-management service sends the data and request-id to start testing, training, or grid-searching on a defined model. – Dataset-analysis service sends the data with the selected model and bestfitted parameters to start training or testing. 3.5
Data-Analysis Service
Main responsibilities: – Provide insights information and basic dataset characteristics.
140
I. Uzun et al.
– Choose the best-fitted model from a list of available model-management service models to send dataset with training request to model-management service. Dependencies and technology stack: Dataset-analysis service is written in Python. Service is built on the base of the Django framework and opens API for other services using DjangoRestFramework. Service uses MongoDB as storage for training, testing, and grid-search requests. Communication with other services: – Retrieves data from dataset-management service. – Sends data with a set of insights information of data-set and chosen predictive model id to model-management service.
4
Case of the Study and Results Analysis
To evaluate the results of the described project, we will use the peer review method. The following indicators will be used as assessment criteria: General metrics: – User-entry threshold (UST) - general complexity of main goal achievement and skill requirements of the user. (1 - user is not required to have neither engineering skills nor data-science skills; 2 - user is required to have either engineering skills or data-science skills; 3 - user is required to have both engineering skills and data-science skills). – Predictions quality (PQ) - preciseness of time-series prediction considering all time-domain. (1 - can support only one data domain; 2 - can support more than one data domain). – Time metric (T) - average time (in hours) required for a user to finish an operation + 1. – Multi-threaded processing (MTP) - ability to run a few operations of the same type at the same time. (1 - there is no possibility to run more than one operation at a time; 2 - a few operations of the same type can be executed at the same time). User-operation metrics (UOM): – Data collection (DC) - gathering new datasets from external sources. – Data preparation (DP) - processing and preparation of dataset before training. – Data management (DM) - loading, removing, splitting, updating and other dataset-manipulations. – Data analysis (DA) - analysis of dataset to choose the best-fitted predictive model. – Data forecasting (DF) - forecasting future values of time-series data.
Agile Architectural Model
141
Developer-operation metrics: – Existing methods support (EMS) - support existing functionality within the system, fix issues and tune methods. – Extension of functionality (EF) - add new dataset collection, preparation, management, analysis or predictive tooling to the platform. To compare practical benefits of using informational systems based on a proposed architecture with manual methods of time-series data forecasting and other services and market competitors such as Amazon Forecast Service (AFS) [35], we will calculate a total efficiency score for each method. Total efficiency will have the following shape: Score =
PQ U DE
(1)
where PQ is prediction quality of the method and UDE stands for an indicator of average effort of the user and developer calculated by formula: U DE = U SM + DEM
(2)
where USM stands for a user effort metric and DEM is developer effort metric. Here, total USM metric will be equal to the sum of user effort metrics for all five user operation metric (DC, DM, DA, DP, DF): U SM = U SMDC + U SMDM + U SMDA + U SMDP + U SMDF
(3)
User effort metric for each user operation metric will be calculated by formula: U SMU OM = (
TU OM )U STU OM M T PU OM
(4)
where T is time metric, MTP stand for multi-threaded processing indicator and UST is an user-entry threshold. Developer effort metric will be calculated by the following formula: TEM S DEM = TEF
(5)
where EF is functionality extensions operation and EMS stands for old features support. Manual dataset collection is required in manual and forecasting as a service system such as AFS. In this topic, we will consider only situations when users have no dataset ready and need to collect data from scratch. Data collection operation consists of a few steps: finding the data source and organizing data within some period using API. Searching for the proper API can take from 0.25 to 2 h and be required for all compared methods. To collect data from the API user needs to create a data pulling mechanism. To do so, a user must have development experience and spend up to 6 h covering this step. In this paper’s system, the user does not need to search for an API and handle the collection
142
I. Uzun et al.
step manually. With the help of a data-collection service, the dataset collection steps are automated, and all that user needs to do is choose an API from the list. Neither manual dataset collection method nor analog systems can provide a collection of few datasets simultaneously. Dataset-collection service from this article can support up to infinite dataset collectors at a time (Table 1). Table 1. Dataset collection metrics Manual forecasting Alternative systems Described systems TDC
9
9
1
M T PDC 1
1
2
U STDC
2
1
2
Manual creation of dataset-processing methods is required in both manual and forecasting-as-a-service-system such as AFS and implementation of such tooling requires engineering and data-science skills. Assuming that the user knows which dataset-processing methods are needed for the current dataset, performance can take 6 to 12 h. There are no systems available to provide multithreaded dataset-processing mechanisms. This article’s data-processing service offers all needed tooling for dataset processing before model-training training. Working together with dataset-analysis service, the set of required dataset preprocessing methods will be sent automatically without any user interactions. Dataset-processing service allows running processing on different datasets at the same time (Table 2). Table 2. Dataset processing metrics Manual forecasting Alternative systems Described systems TDC
13
13
1
M T PDP
1
1
2
U STDP
3
3
1
Manual dataset management operations execution is required for manual and forecasting-as-a-service-system such as AFS, and assuming that users will not automate any dataset manipulation operations, it will not require any development or data-science knowledge. Depending on the type of manipulations that may be necessary for a particular dataset, this can take from 0.5 up to 2 h of the user’s time. There is no way to multithread the process of manual dataset amendment. Dataset-management services provide a set of functions for all types of data manipulations needed to transfer between services and eventually send it to training. This process can be executed in a multithreaded way, using the system design described (Table 3).
Agile Architectural Model
143
Table 3. Dataset management metrics Manual forecasting Alternative systems Described systems TDC
3
3
1
M T PDM 1
1
2
U STDM
1
1
1
Creation of dataset-analysis methods is required for both manual and forecasting-as-a-service-systems such as AFS and implementation of such tooling requires engineering and data-science skills and can take from 6 to 12 h. There are no systems available to provide multithreaded dataset-analysis mechanisms. This article’s dataset-analysis service provides all needed tooling for dataset analysis before dataset-processing or model-training. Dataset-analysis service allows running processing on different datasets at the same time (Table 4). Table 4. Dataset analysis metrics Manual forecasting Alternative systems Described systems TDC
13
13
1
M T PDA
1
1
2
U STDA
3
3
1
Implementation of forecasting methods requires engineering and data-science skills and can take from 6 to 12 h. Other forecasting-as-a-service-systems such as AFS provide ready-made models for predictions using API. For this case, users only need to call provided methods from their systems, requiring some engineering skills. There are no systems available to offer multithreaded time-series forecasting mechanisms. This article’s model-management service provides all needed tooling for time-series forecasting and can train a few models simultaneously on different datasets (Table 5). Table 5. Dataset forecasting metrics Manual forecasting Alternative systems Described systems TDC
13
2
1
M T PDF
1
1
2
U STDF
3
2
1
Unlike other systems the system described is maintainable and scalable by design, making it easy to extend existing functionality and support already
144
I. Uzun et al.
existed. Microservices architecture allows extending functionality individually for each service. Field-testing of the application built using provided architecture showed that it takes up to 0.5 h to add processing or data-analysis method or up to 1 h to integrate the newly created predictive model. Manual prediction mechanisms require the overriding of the whole project to change the predictive model, taking a minimum of 12 h of development time. Alternative systems such as AFS can provide the ability to choose from the list of available predictive models without the need for system overriding. However, the peripheral functions have to be amended, taking up to 3 h of development time. Microservices architecture allowed to reduce the time needed to tune, fix or refactor existing functionality to an average of 1 h. Alternative designs, on the contrary, are hard to maintain because of static configuration and tight coupling of functionality and can take on average 3–5 h. For alternative solutions, such AFS can be reduced by 3 h because of no control on predictive models in client-code (Table 6). Table 6. Developer effort metrics Manual forecasting Alternative systems Described systems TEM S 13
4
2
TEF
3
2
6
Systems built based on described architecture can support theoretically infinite predictive models that can be used for any data domain and be chosen automatically by the dataset-analysis service. However, manual prediction mechanisms and alternative systems can support only one model at a time, establishing a tight coupling between the data domain and the predictive model (Table 7). Table 7. Prediction quality metric Manual forecasting Alternative systems Described systems PQ 1
1
2
Evaluation of final efficiency scoring for all three types of forecasting systems by the formula mentioned above, we received a total score of 7.66 · 10−11 for the usage of manual forecasting methods, 2.22 · 10−4 for the alternative systems, and 3.07 · 10−1 for systems that are built based on described architecture. Comparison of these three approaches revealed that applications built using proposed architecture are approximately 1380 times more efficient than using alternative systems as AFS and almost 4 billion times more efficient than manual forecasting efforts.
Agile Architectural Model
5 5.1
145
Conclusions Discussion
As a result of this research, the system based on microservices architecture was built. The final product consists of five backend microservices and frontend. The combination of dataset-processing, dataset-management, dataset-collecting, dataset-analysis, and model-management services creates a grained and welldecoupled architectural design pattern. Using a provided application makes it possible for clients to have a smooth and stable end-to-end workflow with their data. Combining all needed functionality and a broad choice of tooling for proper management, analysis, and prediction of time-series data in one place solves the use-case problem when users need to use more than one application to solve their tasks and receive benefits from their data. The fire and forget interaction pattern applied in all the services that communicate with the front end makes the user experience extremely comfortable working with big data. The ability to start collecting data in our service unblocks our service usage for clients with no datasets on their own or makes it possible to play around with the system working with collected data as test data. Having two different profiles and workflows for different users makes it possible for common users and advanced clients to work on the platform. The proper way of presenting the model training and testing results solves the “service as black-box” problem and makes it possible for the user and developers to analyze prediction results and maintain the system properly. Deep analysis of available technologies for solving such technical problems as big-data storing and transferring between devices eventually helped us find a set of external services and techniques that help solve them most elegantly. Analysis of analogs of the application on the market shows that we have solved most of their problems. The main advantage of the system is the ability to scale dynamically. The services described in this article can be quickly and easily extended with new functionality, essential in the consistent development of new time-series analysis, processing, and prediction methods and techniques. 5.2
Future Research
The future steps of research are to increase performance of applications based on described model when working with extremely large datasets by looking into using streaming protocols instead of discrete data transferring for inter-service communication. One of the future plans is to extend architectural model with new services that potentially could accumulate other often used methods when working with time-series data. For example, results-centre could be added, to provide extendable amount of methods for prediction results analysis and visualisation, what eventually would help users save their time during search for insights information about their trained models. One more research task can be search for solutions to assure the required level of fault- and intrusion-tolerance of microservice architecture considering its benefits in comparison with monolith one [27].
146
I. Uzun et al.
References 1. Time series: Arima methods. International Encyclopedia of the Social and Behavioral Sciences, pp. 15704–15709. https://doi.org/10.1016/B0-08-043076-7/ 00520-9 2. What is data visualization? A definition, examples. https://www.tableau.com/ learn/articles/data-visualization 3. Documenting Software Architectures: Views and Beyond, Second Edition (2010) 4. Software Architecture in Practice, Third Edition. (2012) 5. Building Microservices, p. 473. O’Reilly Media, Sebastopol (2016) 6. Microservice Architecture: Aligning Principles, Practices, and Culture, p. 146. O’Reilly Media, Sebastopol (2016) 7. Apache: Apache kafka. https://kafka.apache.org/ 8. Apache: Spark - lightning-fast unified analytics engine. http://spark.apache.org/ 9. Apache: Welcome to apache hbase. https://hbase.apache.org/ 10. Apache: What is cassandra?. https://cassandra.apache.org/ 11. Bondi, A.B.: Characteristics of scalability and their impact on performance. WOSP 2000, pp. 195–203 (2000). https://doi.org/10.1145/350391.350432 12. Box, G.E.P., Jenkins, G.: Time Series Analysis. Forecasting and Control. HoldenDay Inc, USA (1990) 13. Cherkassky, V., Mulier, F.: Learning from Data: Concepts, Theory, and Methods, p. 560. Wiley-IEEE Press (2007) 14. Deb, C., Zhang, F., Yang, J., Lee, S., Kwok Wei, S.: A review on time series forecasting techniques for building energy consumption. Renew. Sustain. Energy Rev. 74(2017). https://doi.org/10.1016/j.rser.2017.02.085 15. Django: Django - the web framework for perfectionists with deadlines. https:// www.djangoproject.com/ 16. Django: Django rest framework. https://www.django-rest-framework.org/ 17. Experience D Dockerizing. https://developerexperience.io/practices/dockerizing 18. Gorbenko, A., Kharchenko, V., Tarasyuk, O., Furmanov, A.: F(I) MEA-technique of web services analysis and dependability ensuring. Rigorous Development of Complex Fault-Tolerant Systems, pp. 153–167 (2006). https://doi.org/10.1007/ 11916246-8 19. Gorbenko, A., Romanovsky, A., Kharchenko, V., Mikhaylichenko, A.: Experimenting with exception propagation mechanisms in service-oriented architecture. In: Proceedings of the 4th International Workshop on Exception Handling, pp. 1–7. WEH 2008. Association for Computing Machinery, New York, NY, USA (2008). https://doi.org/10.1145/1454268.1454269 20. Gu, X.: The impact of maintainability on the manufacturing system architecture. Int. J. Prod. Res. 55(15), 4392–4410 (2017). https://doi.org/10.1080/00207543. 2016.1254356 21. Hawkins, T., Plugge, E., Membrey, P.: The Definitive Guide to MongoDB: The NoSQL Database for Cloud and Desktop Computing. Apress, N.Y. (2010) 22. Hofmann, E.: Big data and supply chain decisions: the impact of volume, variety and velocity properties on the bullwhip effect. Int. J. Prod. Res. 55, 5108–5126 (2015). https://doi.org/10.1080/00207543.2015.1061222 23. Hofmeister, C., Kruchten, P., Nord, R.L., Obbink, H., Ran, A., America, P.: J. Syst. Softw. Int. J. Prod. Res. 80, 106–126 (2007). https://doi.org/10.1016/j.jss. 2006.05.024
Agile Architectural Model
147
24. Hyndman, R.J., Athanasopoulos, G.: Forecasting: Principles and Practice, chap. 1.4: Forecasting data and methods. OTexts 25. Joshi, N.: Data preprocessing. https://medium.com/analytics-vidhya/datapreprocessing-5f2dc4789fa4 26. Kirchner, J.W.: Data analysis toolkit 10: Simple linear regression derivation of linear regression equations (2001) 27. Klapchuk, R., Kharchenko, V.: Monolith web-services and microservices: Comparation and selection. Radioelectron. Comput. Syst. (2017). http://nbuv.gov.ua/ UJRN/recs 2017 1 8 28. Little, M., Maron, J., Pavlik, G.: Java Transaction Processing. Prentice Hall, Hoboken (2004) 29. Martin, R.C.: Clean Architecture: A Craftsman’s Guide to Software Structure and Design. Prentice Hall, Hoboken, Robert C. Martin Series (2017) 30. Oloruntoba, S.: Solid: The first 5 principles of object oriented design. https:// www.digitalocean.com/community/conceptual articles/s-o-l-i-d-the-first-fiveprinciples-of-object-oriented-design 31. Ortiz, G., Caravaca Diosdado, J.A., Garcia-de Prado, A., Ch´ avez, F., BoubetaPuig, J.: Real-time context-aware microservice architecture for predictive analytics and smart decision-making. IEEE Access (2019). https://doi.org/10.1109/ ACCESS.2019.2960516 32. Pang, K.W., Chan, H.L.: Data mining-based algorithm for storage location assignment in a randomised warehouse. Int. J. Prod. Res. 55, 4035–4052 (2017). https:// doi.org/10.1080/00207543.2016.1244615 33. RSocket: Rsocket - an application protocol providing reactive streams semantics. https://rsocket.io/ 34. Se-Young, Y., Brownlee, N., Mahanti, A.: Characterizing performance and fairness of big data transfer protocols on long-haul networks. In: 2015 IEEE 40th Conference on Local Computer Networks, vol. 40, pp. 213–216 (2015). https://doi.org/ 10.1109/LCN.2015.7366309 35. Services, A.W.: Accurate time-series forecasting service, based on the same technology used at amazon.com, no machine learning experience required. https://aws. amazon.com/forecast/ 36. Spring: Spring framework. https://spring.io/projects/spring-framework 37. Strohbach, M., Daubert, J., Ravkin, H., Lischka, M.: Big data storage. New Horizons for a Data-Driven Economy, pp. 119–141 (2016). https://doi.org/10.1007/ 978-3-319-21569-3-7 38. Tierney, B., Berkeley, L., Kissel, E., Swany, M., Pouyoul, E.: Efficient data transfer protocols for big data. In: 2012 IEEE 8th International Conference on 8 October 2012 (2012). https://doi.org/10.1109/eScience.2012.6404462 39. Vance, A.: Hadoop, a free software program, finds uses beyond search (2009). https://www.nytimes.com/2009/03/17/technology/business-computing/ 17cloud.html 40. Vijay: Fire and forget interaction pattern. for deeper engagements with conversational user interfaces (2020). https://medium.com/@teriyatha.napar/fire-andforget-interaction-pattern-4b4690de364d
Essential R Peak Detector Based on the Polynomial Fitting Olga Velychko1 , Oleh Datsok2 , and Iryna Perova2(B) 1 2
Yaroslav Mudryi National Law University, Kharkiv, Ukraine National University of Radio Electronics, Kharkiv, Ukraine [email protected]
Abstract. R peaks detection is one of the uncomplicated tasks, successfully solved for the stationary ECG systems. The finite computational capacities of the recent mobile devices circumscribe applying powerful methods for ECG waves recognition. Proposed in this paper method of R peaks detection is based on the evaluation of the shape of an m sample segment and oriented to develop specific mobile applications conjugated with a portable ECG device. R peaks recognition is performed with the parameters, obtained as the result of polynomial fitting an m sample segment and characterizing the segment shape and position of the segment central point relatively the focus point. The complex of criteria for R peak distinguish was defined as the result of analysis of testing signals from the MIT-BIH Arrhythmia Database. The algorithm for R peaks searching in an optionally defined time interval was developed and can be used for the formation of RR intervals array for the heart rate monitoring. The sensitivity of the method for the raw signals with positive R peaks and is 99.6%. Keywords: R peak detection · Polynomial fitting application · Mobile-based ECG system
1
· Mobile
Introduction
According to the American Heart Association, Globally, nearly 18.6 million people died of cardiovascular disease in 2019. There were more than 523.2 million cases of cardiovascular disease in 2019, an increase of 26.6% compared with 2010. Experts predict the global burden of cardiovascular disease will grow exponentially over the next few years as the long-term effects of the current COVID-19 pandemic evolve [1]. One-third of deaths due to heart attacks and strokes occur prematurely in people under 70 years of age, as revealed by the World Health Organization [6]. This explains the wide interest in the development of monitoring systems for various purposes. According to the classification presented by the authors c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 148–163, 2022. https://doi.org/10.1007/978-3-030-82014-5_10
Essential R Peak Detector Based on the Polynomial Fitting
149
[23], the most advanced and technically equipped are hospital and ambulatory systems with powerful computing capabilities, implementing the most complex types of ECG analysis. In contrast, very little is known about the development of mobile-based ECG systems being available to warn the person about the harmful state, transmit the alert signal or produce other rapid response functions [12,13]. Some of these functions are based on an essential heart rate analysis, which should be performed in real-time with the lowest possible delay and can be implemented with mobile applications. The general units of the mobile-based ECG system are illustrated in Fig. 1. Consider some examples of mobile-compatible devices for ECG registration. ECG registration had been realized in the Apple Watch. The back panel and the Digital Crown contain electrodes, activating after running mobile application and touching (no press) the wheel by a finger. The result of rhythm analysis will be presented after 30 s registration. Report contains ECG and result of analysis are saved in iPhone. The gadget generates an alarm message to notify the user about a found anomaly and send this information as pdf-report to a doctor [2].
Fig. 1. Block diagram of the mobile-based ECG system
The KardiaMobile ECG contains a sesor with three superficial stainless steel electrodes, two of them are located on the face side and one is on the back side of device to provide the contact with the person’s left leg (Fig. 2). Recorded ECG are saved in the Kardia Basic application, which displays a graph, heart rate and can define the basic types of arrythmia such as tachycardia, bradycardia, atrial flutter, heart blocks and ventricular extrasystole. The user can view saved ECG, print and send the pdf-report to email [14]. The mini cardiograph HiPee WeCardio allows controlling the heart rate and registering ECG. The sensor is set between the index and middle fingers of both
150
O. Velychko et al.
Fig. 2. KardiaMobile ECG
wrists (Fig. 3). The device is automatically turned on and turned off. As in most similar devices, information is transmitted via Bluetooth to a smartphone, on the screen of which the measurement results are displayed. The device can detect cardiac arrhythmias, myocardial ischaemia, atrial fibrillation and other disorders of the cardiovascular system [4].
Fig. 3. HiPee WeCardio
The QardioCore portable heart rate monitor is an X-shaped sensor for iOS (Fig. 4) that is attached with an electrode strap to the chest and provides continuous monitoring of a single-channel ECG, skin temperature, respiratory rate and movement activity. The device is recommended for diurnal physical activity control. The sensor connects to a smartphone via a Bluetooth interface. The Qardio application controls the sensor and displays measurements and results of analysis on the screen [20]. As noted in [8,16], the use of wearable medical gadgets assists to reveal hidden pathologies of the cardiovascular system that were not detected during clinical research. The patient records an ECG on demand when the discomfort is felt or during physical activity. Devices similar to those described above can be used to record an ECG with the subsequent extraction N Sample Dataset, containing the array of amplitude values and determining the periodicity of analysis (see Fig. 1). Further, R peaks are extracted from the current signal fragment and passed for the Heart Rate
Essential R Peak Detector Based on the Polynomial Fitting
151
Fig. 4. HiPee WeCardio
Analysis. Periodicity of analysis is stipulated with the applied methods of the data treatment, one of them had been proposed in [26]. The alert Messaging unit is intended for evaluating computed parameters according to predefined criteria. This study aimed to address the task of R peak detecting with the minimum signal preprocessing and computing resource.
2
Problem Statement
The standard methods of R peaks detection can be divided into hardware and software. Hardware methods primarily use the principle of comparing the amplitude value of the signal with a threshold level that can be lead to the false or/and omission of peak detecting for the signals with the bloating baseline. Software methods comprise the preprocessing for signal denoising and eliminating drift baseline, and successive signal analysis, requiring significant computing resources. The overwhelming majority of investigations have addressed the methods of peaks recognition, oriented to the stationary and remote ECG systems, using the cloud technology data analysis and demanding the stable wireless connection. In this paper, resolving the task for R peaks detecting with minimum computing resources that can be applied for developing the mobile-based ECG monitoring system is proposed.
3
Literature Review
Authors [15] proposed the algorithm based on the primitives improving the accuracy of the R peak detection. Initially, a signal has been filtrated, including the methods of decomposition, that allows eliminating fluctuations and noise caused by the random moving of a human body. Then the isoline is removed and the morphological operation is performed to delineation the R peak. QRS complexes are described with primitives depending on QRS shape and duration. Description of QRS contains left and right components. Authors advance two scaling coefficients (vertical and horizontal) for error minimization. The method is sensitive to the QRS shape and provides high detection accuracy in signals with the expressed Q, T peaks and delineates waves with errors in other cases. The method had been realized in the microcontroller block of one channel electrocardiograph and recognizes QRS complexes in real-time.
152
O. Velychko et al.
The principle of amplitude sample comparison is based on the method of QRS elements delineation [21]. The signal preprocessing is the required condition, including the low-frequency filter, high-frequency filter and rejection filter. Found in the signal entire maximums are ranked in descending order, then the maximum from them is defined. Entire peaks with amplitude greater than +k% (k is the average of 60% of maximum ECG amplitude and mode) of maximum are selected. Further, entire samples non-exceeding more than 0.2 mV within 0.2-s peaks are removed. Finally, false peaks in the rising and falling curve of the R peak caused by noise are eliminated. S peak is delineated to the right of the R peak, in the negative values area, in an interval of 10% of the current cardio cycle. The Q peak detection is performed to the left of the R peak and defined as the minimum negative value. T peak is the positive values area after the QRS complex, the search for which is performed on an interval M (M = 12% of the current cardio cycle duration). P peak is also detected in a region of positive values in the interval of 2M samples after the T peak of the previous cycle and M/4 samples before the R peak of the current cardio cycle. The authors had not mentioned the accuracy of the proposed method and description of peaks durations calculating. Described [3] in method combines two stages: pre-processing of the ECG and QRS complexes detection. At the first stage, the authors apply a filter system to eliminate the baseline floating and high-frequency noise. Next, the ECG sections are marked to delineate peaks and valleys, which are combined if the interval less than 0.03 s (the shortest peak in the QRS is taken equal to 0.03 s), where they were found. In the compressed signal the QRS candidates are distinguished by comparing the average and the standard deviation with the threshold values. The final step is the correction of the results by eliminating false peaks closed to each other, or adding QRS in the intervals where peaks are far from each other. The sensitivity of the method is 99.78% for I and II lead signals. The authors of [10] had analysed the standard methods of ECG wave recognition and proposed their different combinations for different test databases and leads, such as the methods of Pan-Tomkins [22], Elgendi et al. [9], Martinez et al. [18], Sun et al. [25]. These methods demonstrate the high sensitivity (over 99%), however, they are intended exclusively for processing the signals in I and II leads. In [27], the method for QRS complexes delineation and subsequent signal segmentation based on a two-level convolutional neural network (Convolutional Neural Network, CNN) was proposed. The method performs the signal processing and extraction of specific areas. Each layer has two levels: 1-D convolution and subsampling. At the convolution level, rough areas are extracted from the signal at the subsampling level, precise details are found. Further, entire extracted features are used in a Multi-Layer Perceptron for QRS complexes recognizing. The method is applied for QRS detection in V2, V5 leads and has high accuracy (sensitivity −99.77%, error −0.32%). Authors [5] proposed the method of QRS complexes recognition for a multichannel electrocardiograph in real time. The general steps of method are ECG
Essential R Peak Detector Based on the Polynomial Fitting
153
preprocessing, channel grouping and the averaged signals formation, peak detection, determination of QRS complexes window, identification of complexes which were not detected at the previous stages, classification of complexes according to their morphological characteristics. To search the peaks and complexes, the amplitude values are compared with the threshold values within the specified window width, which is taken less than the T peak width. A training group was used for the morphological analysis of the complexes. The sensitivity of the method exceeds 96%, the error of QRS determination is 12%. Advantage of method is the ability to recognize complexes with abnormal conduction and cardiac stimulation in patients with heart failure and cardiac resynchronization therapy. The method of ECG segmentation based on the Neuro-fuzzy model and wavelet analysis was proposed in [17]. The method includes a signal preprocessing and extraction of informative features with their subsequent classification. A discrete wavelet transform is used to compress the signal. The detailing coefficients of the 4th order are the inputs of the Neuro-fuzzy system, detecting the QRS complexes. The authors [7] had developed the method for QRS complexes searching in real-time. The basic stages are preprocessing, feature extraction, probability density estimation, application of Bayesian decision rules (calculation of posterior probabilities) and their subsequent merging. The 40 “validated candidates” dataset for training the system is formed online. The QRS complexes are recognized in II and V5 leads with 87.48 ± 14.21% sensitivity.
4
Method of R Peaks Detecting
Recent investigation is based on the signal analysis from the MIT-BIH Arrhythmia Database, containing 48 half-hour annotated excerpts of two-channel ambulatory ECG recordings [11,19]. The sampling frequency is Fs = 360 Hz, the quantization level is 211 . Reading the signals, annotations and the automatic detected R peaks was accomplished with The WFDB Toolbox for MATLAB and Octave [24]. 4.1
R Peak Fitting with the Quadratic Polynomial Function
R peak is the central element of QRS complex, characterizing the ventricular contraction. Amplitude, shape and duration of R peaks depend on the used lead and bioeletrical processes in the heart. Due to the narrow spike of the shape R peaks are visually recognized in ECG in most cases (Fig. 5). A certain disease specifies the parameters of ECG signal such as polarity, amplitude and shape of R. We had developed the method, processing the signals with no auxiliary preprocessing and baseline compensation. We had presumed that the apex of R peaks, having the closest to the parabolic shape, could be satisfactorily described with the 2nd-degree polynomial function:
154
O. Velychko et al.
Fig. 5. 30000 samples fragment of signal ‘102.dat’
P (n) = k0 + k1 n + k2 n2
(1)
Each sample n = 3, · · · , N − 2 is approximated with (1), where N is the number of samples in a signal fragment. We had used the short length fragment N = 2Fs , Fs is the sample frequency. Dataset volume for n sample fitting contains m = 5 samples: S(n − 2), S(n − 1), S(n), S(n + 1), S(n + 2) and fixed x values: x ∈ [1; 5]. The degree of the polynomial and the segment width were preferred to minimize the computation effort. Figure 6 demonstrates the 2nd-degree polynomial fitting P (n), describing the signal segment accurate enough. The central point (n = 3) corresponds to the R candidate. Point F is the focus of polynomial with coordinates: xF = −
k1 , 2k2
yF = k2 x2F + k2 xF + k0 .
(2) (3)
The shape of apex Rc in point n may be characterized with some parameters (see Fig. 5): – – – – –
Location of parabola focus, F (xF , yF ); Location of the parabola apex, A(xA , yA ); Inclination angle αn of line Rc F ; Distances, dnF from the points n − 2, n − 1, n, n + 1, n + 2 to the focus F ; sign(kn2 ) < 0 when parabola opens down.
The negative sign(kn ) reduces the number of S(n) for analysis. Owing to the fact, that yF correlates with the amplitude of the signal and requires the definition of threshold amplitude value that can result in the loss of R peaks with amplitude smaller than a predefined threshold, we have not included this parameter in the analysis. alphan defines the inclination angle Rc F of line (solid line
Essential R Peak Detector Based on the Polynomial Fitting
155
Fig. 6. Fitting of m sample fragment
labeled dnF in Fig. 7. Closest to the vertical orientation of Rc F line is observed for the most probable R candidate illustrated in the central graph of Fig. 7. The right and left graphs demonstrate significant deflection from the vertical line, underlying the non-symmetrical shape of the peak. Rc position relatively the parabola focus is evaluated with two parameters: the inclination angle αn line Rc F and distance dnF .
Fig. 7. Curve fitting in three adjacent points
4.2
Criteria of R Peaks Recognition
To evaluated the shape of polynomial fitting we had computed M = 14 parameters Xn1 , · · · , Xn14 for each n point within the analyzed signal with L samples
156
O. Velychko et al.
length. Entire values were embedded into the matrix X(L − 4 × M ). Fitting in points n = 1 , n = 2, n = L − 1, n = L − 2 had not been performed owing to restriction m = 5. Description of matrix structure is given bellow: Xn1 is the coefficient kn2 ; Xn2 is the coefficient kn1 ; Xn3 is the coefficient kn0 ; Xn4 is the distance d(n−2)F ; Xn5 is the distance d(n−1)F ); Xn6 is the distance dnF ; Xn7 is the distance d(n+1F ) ; Xn8 is the distance d(n+2)F ; Xn9 is the inclination angle, αn ; Xn10 is xnF ; Xn11 is ynF ; Xn12 is ynA ; dnF – Xn13 is the ratio ; d(n−2)F dnF . – Xn14 is the ratio d(n+2)F
– – – – – – – – – – – –
Two last variables were introduced as additional parameters to evaluate the parabola shape. Figure 7 demonstrates differences in the focus distances for n and n − 2 samples, imparting the asymmetrical form of the curve. Assumption about Xn13 and Xn14 significance is confirmed with the illustrated in Fig. 8 data. As seen from the figure, appropriated to the R peaks points are well localized in the space. The area closed to the found R peaks contains the unconfirmed R candidates with one or more false conditions for R peak recognition. It was not possible to investigate the significance of entire parameters Xnj of testing signals because the total number of observations exceeds 3 million. The primary argument of the parameters selection is to avoid using the absolute values and their comparing with the predefined thresholds. Thereby, the evaluation of the parabola apex, Xn1 , Xn6 , and location of R candidate relatively its focus, Xn9 , Xn10 , Xn13 , Xn14 had been proposed for R peaks recognition. Numerical margins of parameters were determined as the result of the testing signals analysis. The conditions of R peak recognition is depicted in Table 1. 4.3
Algorithm of R Peak Detection
The algorithm, given in Fig. 9, based on the proposed method for R peaks detection. The calculation is performed for a sequence of intervals with N samples length and 2-s duration. We had assumed that during this interval an R peak will be observed in ECG in any event regardless of the signal features. The signal samples are saved into array S(N ). The calculating process is being initiated at n1 = 3 and completed at n2 = N − 2 as the result of using two previous and two last samples of S(N ) array for the polynomial approximation. Each subsequent
Essential R Peak Detector Based on the Polynomial Fitting
157
Fig. 8. The fragment of signal depiction in (X13 , X14 ) space Table 1. The complex of criteria for R peak distinguishing Parameter Variable Condition kn2
Xn1
dF
Xn6
αn
X9
xnF X10 dF (n) X13 dF (n − 2) dF (n) X14 dF (n + 2)
X1 < TX1
lef t right X6 ∈ (TX ; TX ) 6 6
X9 ≥ TX9
Numerical margin TX1 = 0 lef t right TX = 0.5, TX =3 6 6
TX9 = 45
lef t right lef t right X10 ∈ (TX ; TX ) TX = 1.5, TX =3 10 10 10 10 lef t right lef t right X13 ∈ (TX ;TX ) TX = 0.2, TX =3 13 13 13 13 lef t right lef t right X14 ∈ (TX ; TX ) TX = 0.2, TX =3 14 14 14 14
interval comprises two last samples from the previous interval, preventing the gaps in the data processing. R(N ) array is intended to save 1 for R candidates and 0 otherwise. The polynomial coefficients, k0 (n), k1 (n), k2 (n), coordinates of focus, F (xF (n), yF (n)) and apex of the parabola, yA (n) are computed in the cycle for the n-th sample and then used for determining the inclination angle, α(n), focus distances, dF (n − 2), dF (n), dF (n + 2) and their ratio. The second cycle performs the checking conditions from Table 1 and annotates R candidate samples as ones. The third cycle examines R candidates. It has been proposed that if two adjacent R candidates were revealed the sample, having the higher value of the inclination angle, is being preferred. This condition is based on the results of data analysis. The number of adjacent R candidates for confirmation of their
158
O. Velychko et al.
Fig. 9. Block diagram of R peaks detection
validity may be augmented by altering the boundary conditions of R peaks recognition. In addition, to exclude non-valid R candidates within the cardio cycle the forbidden for searching time intervals appropriate to the duration of P R and QT intervals can be additionally introduced.
Essential R Peak Detector Based on the Polynomial Fitting
5
159
Experiment, Results and Discussions
The proposed algorithm was realized in Matlab and 48 half-hour signals from the MIT-BIH Arrhythmia Database had been analyzed. Detected with the ecgpuwave function from the WFDB library [18] R peaks has been used for comparing and evaluating the developed method. Variable X1 , X6 , X9 , X10 , X13 , X14 had been computed for each signal. Data, selectively presented in Table 2, demonstrate restricted range average values of parameters and allow to define appropriate margins for R peaks detection (see Table 1). Table 2. Average value of parameters Signal
M ean ± σ X1
X6
X9
X10
X13
X14
111
−0.03 ± 0.005
0.70 ± 0.16
64.99 ± 12.47
3.04 ± 0.34
0.34 ± 0.06
0.36 ± 0.13
113
−0.12 ± 0.025
1.82 ± 0.42
70.56 ± 12.54
2.51 ± 0.33
0.86 ± 0.13
0.66 ± 0.13
115
−0.12 ± 0.012
1.75 ± 0.27
69.84 ± 11.67
2.45 ± 0.25
0.86 ± 0.06
0.64 ± 0.11
116
−0.10 ± 0.023
1.59 ± 0.34
68.78 ± 13.20
2.54 ± 0.37
0.79 ± 0.12
0.60 ± 0.14
123
−0.09 ± 0.007
1.45 ± 0.16
69.12 ± 12.89
2.52 ± 0.25
0.77 ± 0.07
0.55 ± 0.08
124
−0.03 ± 0.005
0.99 ± 0.25
57.72 ± 11
3.53 ± 0.29
0.37 ± 0.06
0.61 ± 0.20
201
−0.03 ± 0.007
0.68 ± 0.11
66.093 ± 13.72
2.85 ± 0.28
0.36 ± 0.08
0.314 ± 0.07
202
−0.04 ± 0.005
0.83 ± 0.12
69.57 ± 14.67
2.83 ± 0.28
0.44 ± 0.07
0.37 ± 0.08
205
−0.06 ± 0.005
1.19 ± 0.16
75.073 ± 13.64
2.78 ± 0.27
0.599 ± 0.06
0.51 ± 0.09
215
−0.03 ± 0.009
0.78 ± 0.21
69.71 ± 14.82
2.94 ± 0.31
0.39 ± 0.09
0.37 ± 0.13
222
−0.05 ± 0.016
0.97 ± 0.24
69.17 ± 15.42
2.73 ± 0.29
0.52 ± 0.14
0.41 ± 0.09
234
−0.06 ± 0.004
1.22 ± 0.17
72.16 ± 12.99
2.73 ± 0.30
0.62 ± 0.06
0.51 ± 0.10
The pair X13 , X14 diminishes the number of R candidates within the QRS complex, reducing their number to two samples, which are demonstrated in Fig. 9. The further distinguishing irrelevant candidate is achieved in accordance with the minimum of α. To evidently demonstrate the difference in parameters of R candidates the method of principal analysis had been applied to the X1 , X6 , X9 , X10 , X13 , X14 variables. As seen from Fig. 10, the R peaks are localized and distinguished in the space. This trend is observed for entire analyzed signals with the positive R peaks, confirming the choice of parameters and defined numerical margins for R peak searching. The accuracy of R peaks detection with the proposed method is illustrated in Fig. 11. The difference between automatic QRS detector and developed by us in the majority of cases does not exceed more than one sample. For the signals, having the QRS complexes with abnormal characteristics the difference in R peaks equals several samples. One of the mentioned instance, when the automated QRS detector has incorrectly recognized S peak as R peak is highlighted in the subplot of Fig. 12. R peaks detection with the developed method has revealed a high accuracy (99.5%) and sensitivity (99.6%) for signals with the positive amplitude of QRS.
160
O. Velychko et al.
Fig. 10. Two adjacent R candidates found in one QRS complex
Fig. 11. R candidates depiction in two-component space (‘116.dat’)
Essential R Peak Detector Based on the Polynomial Fitting
161
Fig. 12. Result of R peaks detection with the proposed method
Concerning the negative R peaks, the accuracy of the signals fluctuates, on average, from 57% to 83%. Nonetheless, we had performed the preliminary analysis based on the proposed conditions with the transformed signal, S(n)2 , giving the satisfactory result of R peaks detecting in signals with the positive and negative R waves.
6
Conclusions
Despite the high accuracy of the positive R peaks detecting, the developed method requires future investigations to achieve valid results regardless of R wave polarity. During performing computational experiments with different critical values of the proposed parameters for R peak distinguishing, it had been revealed that the method is sensitive to the P and T peaks, confirming the advisability of further method improvement. Due to the proposed criteria system, the number of probable R candidates in a potential QRC complex is usually limited to two adjacent reports. The advantage of this method is the fact that it does not require preprocessing of the signal to eliminate interference and baseline drift and can be implemented in mobile applications with limited computing resources.
References 1. Heart disease #1 cause of death rank likely to be impacted by COVID-19 for years to come. https://newsroom.heart.org/news/heart-disease. Accessed 16 Mar 2021 2. Taking an ECG with the ECG app on apple watch series 4, series 5, or series 6 (2021). https://support.apple.com/en-us/HT208955. Accessed 16 Mar 2021
162
O. Velychko et al.
3. Burguera, A.: Fast QRS detection and ECG compression based on signal structural analysis. IEEE J. Biomed. Health Inform. 23(1), 123–131 (2019). https://doi.org/ 10.1109/JBHI.2018.2792404 4. Community, X., Devices, S., Care, B., Wizard, H.: HiPee smart ECG Wizard: full specifications, photo – XIAOMI-MI.com (2021). https://xiaomi-mi.com/beautyand-personal-care/hipee-smart-ecg-wizard/. Accessed 16 Mar 2021 5. Curtin, A., Burns, K., Bankm, A., Netoff, T.: QRS complex detection and measurement algorithms for multichannel ECGs in cardiac resynchronization therapy patients. IEEE J. Transl. Eng. Health Med. 6 (2018). https://doi.org/10.1109/ JTEHM.2018.2844195 6. Cardiovascular diseases: https://www.who.int/health-topics/cardiovasculardiseases. Accessed 16 Mar 2021 7. Doyen, M., Ge, D., Beuch´ee, A., Carrault, G., Hern´ andez, A.: Robust, real-time generic detector based on a multi-feature probabilistic method. PLoS ONE 14(10), 333–345 (2019). https://doi.org/10.1371/journal.pone.0223785 8. Drexler, M., Elsner, C., Gabelmann, V., Gori, T., M¨ unzel, T.: Apple Watch detecting coronary ischaemia during chest pain episodes or an apple a day may keep myocardial infarction away. Eur. Heart J. 41(23), 2224 (2020). https://doi.org/10. 1093/eurheartj/ehaa290 9. Elgendi, M., Meo, M., Abbott, D.: A proof-of-concept study: simple and effective detection of P and T waves in arrhythmic ECG signals, October 2016. https://open.library.ubc.ca/collections/facultyresearchandpublications/ 52383/items/1.0379040 10. Friganovic, R., Kukolja, D., Jovic, A., Cifrek, M., Krstacic, G.: Optimizing the detection of characteristic waves in ECG based on processing methods combinations. IEEE Access 6, 50609–50626 (2018). https://doi.org/10.1109/ACCESS. 2018.2869943 11. Goldberger, A., et al.: PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23) (2000). https://doi.org/10.1161/01.CIR.101.23.e215 12. Iwamoto, J., et al.: A new mobile phone-based ECG monitoring system. Biomed. Sci. Instrum. 43(10), 318–323 (2007) 13. Jadhav, K.B., Chaskar, U.M.: Design and development of smart phone based ECG monitoring system. In: 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), pp. 1568– 1572 (2017). https://doi.org/10.1109/RTEICT.2017.8256862 14. Kardiamobile 6l (2021). https://store.alivecor.com/products/kardiamobile6l. Accessed 16 Mar 2021 15. Lee, S., Park, D., Park, K.: QRS complex detection based on primitive. J. Commun. Netw. 19(5), 442–450 (2017). https://doi.org/10.1109/JCN.2017.000076 16. Lovejoy, B.: Critical heart disease detected by his own apple watch - doctor - 9 to 5 mac (2021). https://9to5mac.com/2020/07/01/critical-heart-disease/. Accessed 16 Mar 2021 17. Mahapatra, S., Mohanta, D., Mohanty, P., Nayak, S., Behari, P.: A neuro-fuzzy based model for analysis of an ECG signal using Wavelet Packet Tree. In: 2nd International Conference on Intelligent Computing, Communication and Convergence, ICCC 2016, 24–25 January 2016, Bhubaneswar, Odisha, India, vol. 92, pp. 175–180 (2016). https://doi.org/10.1016/j.procs.2016.07.343
Essential R Peak Detector Based on the Polynomial Fitting
163
18. Mart´ınez, A., Alcaraz, R., Rieta, J.: A new method for automatic delineation of ECG fiducial points based on the Phasor Transform. In: 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, pp. 4586–4589 (2010). https://doi.org/10.1109/IEMBS.2010.5626498 19. Moody, G., Mark, R.: The impact of the MIT-BIH arrhythmia database. IEEE Eng. Med. Biol. Mag. 20(3), 45–50 (2001). https://doi.org/10.1109/51.932724 20. Smart wearable ECG/EKG monitor - Qardiocore 2021 (2021). https://www. getqardio.com/qardiocore-wearable-ecg-ekg-monitor-iphone. Accessed 16 Mar 2021 21. Rajani, A., Hamdi, M.: Automation algorithm to detect and quantify electrocardiogram waves and intervals. In: The 10th International Conference on Ambient Systems, Networks and Technologies (ANT), pp. 941–946 (2019). https://doi.org/ 10.1016/j.procs.2019.04.131 22. Rangayyan, R.: Detection of Events, chap. 4, pp. 233–293. John Wiley & Sons, Ltd. (2015). https://doi.org/10.1002/9781119068129.ch4 23. Serhani, M., El Kassabi, H.T., Ismail, H., Navaz, A.N.: ECG monitoring systems: review, architecture, processes, and key challenges. Sensors 20(6), 1424–8220 (2020). https://doi.org/10.3390/s20061796 24. Silva, I., Moody, G.: An open-source toolbox for analysing and processing PhysioNet databases in MATLAB and Octave. J. Open Res. Softw. 2(1), e27(5) (2014). https://doi.org/10.5334/jors.bi 25. Sun, Y., Chan, K., Krishnan, S.: Characteristic wave detection in ECG signal using morphological transform. BMC Cardiovasc. Disord. 5(28) (2005). https://doi.org/ 10.1186/1471-2261-5-28 26. Velichko, O., Datsok, O.: Analysis of the heart rate variability dynamics during long-term monitoring. Telecommun. Radio Eng. 77(7), 645–654 (2017). https:// doi.org/10.1615/telecomradeng.v77.i7.70 27. Xiang, Y., Lin, Z., Meng, J.: Automatic QRS complex detection using two-level convolutional neural network. BioMed Eng. OnLine 17(13) (2018). https://doi. org/10.1186/s12938-018-0441-4
An Approach to Identifying and Filling Data Gaps in Machine Learning Procedures Peter Bidyuk1 , Irina Kalinina2 , and Aleksandr Gozhyj2(B) 1
National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine 2 Petro Mohyla Black Sea National University, Nikolaev, Ukraine
Abstract. The article considers the methods of detecting and filling gaps in data sets at the stage of preliminary data processing in machine learning procedures. A multi-stage approach to identifying and filling gaps in data sets and combined statistical quality criteria is proposed. The method consists of three stages. At the first stage, the presence of gaps in the data is determined. In the second stage, the patterns of occurrence of gaps are investigated. Three approaches are used: matrix analysis, graphical analysis and correlation analysis. One of the mechanisms of formation of gaps in the data is identified: MCAR, MAR, MNAR. In the third stage, various methods of data generation without gaps are used. The methods used include deleting part of the data with gaps, various replacement methods, methods for predicting missing values. Consider separately the methods of overcoming gaps in time series. The effectiveness of the proposed approach is investigated numerically. Examples of application of methods on various data sets are given. Keywords: Datasets · Gaps Methods of overcoming gaps
1
· Combined statistical quality criteria ·
Introduction
In machine learning tasks, quality of the models depends very much on the data. But the data itself in real problems are rarely perfect. As a rule, the volume of data itself is small, the number of parameters available for analysis is limited, and the data includes noise and exhibits gaps. Data processing and purification are important tasks that must be performed before a data set can be used for training a model. Raw data is often distorted and unreliable, and some values may be omitted. The use of such data in modeling can lead to incorrect results. These tasks are part of the data processing and analysis process and usually involve the initial study of a data set used to define and plan the data processing process. The problems touching upon missing data analysis often occur in machine learning tasks when processing data. The most machine learning methods and c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 164–176, 2022. https://doi.org/10.1007/978-3-030-82014-5_11
An Approach to Identifying and Filling Data Gaps
165
algorithms assume that filled in matrices, vectors, and other information structures are used as input to perform a computational experiment. However, in practice, in actual data sets, the gaps are common, and before starting preliminary analysis and modeling procedures, it is necessary to bring the processed tables to the “canonical” form, i.e. either delete fragments of objects with missing elements, or replace existing gaps with some values. To date, there exist many methods and approaches to processing the missing values in these data, as well as their detailed research [8,11–13,15–18]. Unsuccessful choice of the method of filling in the gaps not only cannot improve, but may also substantially worsen the results of solving the problem. Consider the methods of processing passes, which are widely used in practice, their advantages and disadvantages. Very often standard approaches to solving this problem do not exist, because the approaches largely depend on the context and nature of the data. For example, sometimes it is difficult to determine whether the data contains random gaps or there is a hidden link between the gaps and some other records in the training set. One easy way to solve this problem is to simply ignore or delete rows that lack data measurements by deleting them from the analysis. However, this method may not be effective due to information loss. An alternative method is to fill in the blanks, where the missing value is replaced. The existing basic data implementations will simply replace all missing values with a mean, median, or constant. But this method is not always effective. The problem of identifying and filling in the gaps available in the data is complex and relevant, and its correct solution significantly improves the results of machine learning procedures. Problem Statement. The article investigates and considers methods and algorithms for identification and elimination of gaps in the process of preliminary data processing in machine learning problems. A multi-stage method of identifying and filling in gaps in data sets is proposed. The efficiency of the method is investigated numerically.
2 2.1
Approach to Solving the Problems of Identification and Filling in Missing Data Procedure for Handling Data Gaps
To date, there exist many methods and approaches for identifying and processing the sets with missing data. Unsuccessful choice of the method of filling in the gaps not only cannot improve, but also may greatly worsen the results of solving the problem. Consider the methods of processing data gaps, which are widely used in practice, their advantages and disadvantages. Data processing with gaps consists of the following steps: Definition and identification of data, detection of patterns of occurrence of missing values, formation of data sets that do not contain gaps.
166
P. Bidyuk et al.
The general scheme of the procedure for processing data gaps is presented in Fig. 1. The first stage of the procedure is the identification of missing data. If the original data set is fully completed, then, bypassing the procedure for processing gaps, proceed to the next procedure for data analysis.
Fig. 1. General scheme of the procedure for processing gaps in the data
Analysis of reasons why the data are missing depends on understanding the processes that reproduce the experimental information. However, if the gaps are available, the next step in the procedure is to study the patterns of occurrence
An Approach to Identifying and Filling Data Gaps
167
of missing values. At this stage, three approaches to the assessment of possible patterns of omissions are considered: the matrix approach, the graphical approach, and the elements of correlation analysis. Additional statistical information about the gaps is provided by the matrix approach, and the graphical approach allows displaying on the graphs the regularity of the appearance of missing observations. When using the matrix approach, the initial data set is converted into the binary matrix. The values of zero correspond to the missing values of the variables, and in the first line there are no spaces, and the next lines are sorted by the number of their occurrence. The actual values of the variables, without spaces, correspond to the units in the matrix representation. The first column of the transformation matrix indicates the number of cases in each data sample, and the last row indicates the number of variables with missing values in each row. When using the graphical approach, numerical data are scaled to the interval [0; 1] and are represented by levels of the brightness scale, with darker colors highlighting large values, and light colors – smaller. By default, missing values are represented in red. This visualization allows one to realize whether the omissions of one or more variables are related to the actual values of other variables. Quantitatively, such dependences can be established by calculating the correlation coefficients of the original variables with the frequencies of the missing values. One of the popular methods of filling in the gaps using regression models is based on the use of correlation dependencies. Example 1 . Investigate the patterns of occurrence of missing values in the sleep data set from the VIM library [11]. The data set was compiled based on the results of observations of hibernation of 62 animals of different species. List of variables in the data set: duration of sleep with dreams (Dream), sleep without dreams (NonD), their sum (sleep); body weight (BodyWgt), brain weight (BrainWgt), life expectancy (Span) and pregnancy time (Gest); estimates of the degree of animals predation (Pred), measures to protect their place to sleep (Exp), and risk indicator (Danger), which was based on a logical combination of two variables (Pred and Exp). Thus, the initial data set contains 62 rows and 10 variable columns. The task is implemented using the statistical package R [1–3,7,9,10,14,19]. The first look at the set illustrates the presence of gaps in it (Fig. 2).
Fig. 2. The first rows of the sleep data set from the VIM library
168
P. Bidyuk et al.
At the stage of identifying missing data, it was found that out of 62 rows in the initial data set, 20 rows have at least one missing value. At the second stage of the procedure of processing the missed values, all three approaches to the study of the patterns of omissions are consistently implemented. The matrix approach found the total number of gaps in the set and their location: (42×0)+(2×1)+. . . +(1×3) = 38. The last line of the matrix shows the frequency of skips for each variable (Fig. 3).
Fig. 3. The result of using a matrix approach to the study of patterns of gaps in the data set
The graphical approach gives clarity to the figures of the general statistics of the matrix approach. Histograms from Fig. 4 show the presence and frequency of skips for data set variables, as well as their combinations.
Fig. 4. Availability and frequency of skips for data set variables as well as their combinations
In the diagram from Fig. 5 numerical data are scaled to the interval [0;1] and are presented by the levels of quality scale. Missing values are in red, larger numeric values are in black, and smaller values are in gray and white. This
An Approach to Identifying and Filling Data Gaps
169
visualization allows us to see whether the omissions of one or more variables are related to the actual values of other variables. It was found that at low values of body weight (BodyWgt) or brain (BrainWgt) skips of sleep variables (Dream, NonD) are absent.
Fig. 5. Diagram of real and missing values in the data set
Quantitatively, the relationships between the missing values of the variables were established by correlation analysis. The calculated correlation coefficients of the output variables with the frequencies of the omitted values are shown in Figs. 6 and 7.
Fig. 6. Correlation matrix between pass frequencies
The correlation matrix between the skip frequencies shows that the appearance of skips for the variables NonD and Dream is interrelated (r = 0.9071). This conclusion is logical because NonD + Dream = Sleep. In the correlation matrix (Fig. 7), the frequency of skipping the values of the variable responsible for recording sleep without dreams (NonD) is more likely associated with the protection of the roost Exp (r = 0.245), high body weight BodyWgt (r = 0.227) and duration Pregnancy Gest (r = 0.202).
170
P. Bidyuk et al.
Fig. 7. A pairwise complete correlation matrix of gaps
2.2
Methods for Studying the Patterns of Occurrence of Gaps
For correct processing of gaps it is necessary to define mechanisms of their formation. There are the following mechanisms for the formation of gaps: MCAR, MAR and MNAR [11,12,15]. MCAR (Missing Completely At Random) is a gap generation mechanism in which the probability of skipping for each record in the data set is the same. When identifying the MCAR mechanism, ignoring or excluding records from a set that has data gaps does not distort the results. MAR (Missing At Random) is a mechanism for forming gaps, in which the missed data are not random, but have a certain pattern. Missed data are classified as MAR if the probability of skipping can be determined based on other information that is a set and does not contain gaps. In this case, removing or replacing the gaps will not significantly distort the results. MNAR (Missing Not At Random) is a gap-forming mechanism in which data is missing depending on unknown factors. As a result, the probability of skipping cannot be expressed based upon the information contained in the data set. 2.3
Formation of the Final Data Set
The third, and the last, stage of the procedure for processing allowances in the data is the direct formation of a data set without gaps through the use of one of the known approaches. In most cases, the default solution is to remove or ignore rows or (and) columns with missing values, but these solutions are not always correct. With a large amount of data in the set (n p), application of these approaches will not significantly distort the parameters of the model. But the removal of rows leads to the fact that in subsequent calculations, not all available information is used, the standard deviations increase, the results become less representative. In cases where there are many gaps in the data, this becomes a significant problem. Thus, despite its widespread use, application of this approach to solve practical problems is limited. The advantage of the gap-based approach is that all available information is used to build the model. The main disadvantage is that the approach cannot be used to calculate not all characteristics of the data set. The calculations of
An Approach to Identifying and Filling Data Gaps
171
many of them are associated with algorithmic and computational difficulties that lead to incorrect results. For example, the calculated values of the correlation coefficients may be outside the range [−1; 1]. Easy to implement and use approaches to processing gaps, called ad-hoc methods, include: filling in gaps with the arithmetic mean, median, mode, as well as filling in zeros and entering indicator variables. All variants of this approach have the same disadvantages. Figures 8 and 9 show these shortcomings as an example of one of the simplest ways to fill in the gaps of a continuous variable: to fill in the gaps with the arithmetic mean and mode. Figure 8 shows the distribution of values of the continuous characteristic before filling the gaps with the average value and after it.
Fig. 8. Distribution of values of a continuous variable before filling in gaps and after filling
The figure shows that the distribution after filling in the blanks looks unnatural. This is ultimately manifested in the distortion of all indicators that characterize the properties of the distribution (except for the average value), underestimation of the correlation and overestimation of the standard deviations. Thus, this method leads to significant distortion of the distribution of the variable.
Fig. 9. Distribution of a discrete variable before and after filling in the gaps
172
P. Bidyuk et al.
In the case of categorical discrete variables, spaces are most often filled by the data mode. Figure 9 shows the distribution of the categorical variable before and after filling in the blanks. This indicates that when filling in the gaps of a categorical variable, the mode shows the same shortcomings as when filling in the gaps of a continuous variable with the arithmetic mean. Predicting missing values is the most difficult method of replacing gaps. It includes the following approaches: kNN-estimation, rpart and mice method. The “kNN-estimation” approach uses the method of k nearest neighbors to replace the missing values. That is, for each omission, the k nearest points are selected on the basis of the Euclidean distance, and their weighted average is calculated. The advantage is that you can replace all the missing values in all variables at once. The disadvantage of kNN estimation is the inability to use this approach if the gaps have a factor variable. Rpart and mice are suitable for such cases. The advantage of rpart is that for this approach at least one variable that does not contain spaces is enough. The mice approach implements the Gibbs sampler method, which is the method of forming samples (x1 ; ...; xn ) from given distributions of p(x) m-dimensional variables by repeatedly extracting simulated values. Thus, the basis of the mice approach is data modeling using Monte Carlo methods, which is repeated many times. By default, each variable that contains spaces is modeled against all other variables in the data set. Prediction equations are used to assign the most probable values to the data with gaps. The process is repeated until a certain convergence criterion is reached. For each variable, one can choose the shape of the prediction model. Example 2. Based on the mice method (Gibbs method ), fill in the gaps in the sleep data sample from the VIM library. In the first step, several complete copies of the initial data set (default value is m = 5) are sampled with their estimate of the missing values (Fig. 10).
Fig. 10. The result of the first step of the mice method
According to the results of the first step, it was determined that the gaps were determined for five variables: NonD, Dream, Sleep, Span, Gest. To do this, we used the method “pmm” (average relative to the regressor). In the second step, a linear model was used for each data set to predict the missing values. The results were combined and combined into a final model for which the standard errors and p-values were balanced (Fig. 11). According to the results of the analysis of the final model, it was found that the variable Dream is statistically significantly dependent on the variable Gest, while its dependence on Span is insignificant. In the last step of the method, one of the sets with filled spaces is saved (Fig. 12).
An Approach to Identifying and Filling Data Gaps
173
Fig. 11. The result of the second step of the mice method
Fig. 12. The result of the last step of the mice method
The following approaches to data recovery are used in most cases when the time series is used as a set of initial data. This group of approaches includes: data recovery based on regression models (linear, stochastically linear, polynomial) and the LOCF method (Last observation carried forward ) – repetition of the result of the last observation [11–13,15]. The approach based on the use of regression models is that the missing values of variables are filled using a regression model based on known values of the data set. The method of linear regression allows to obtain plausibly filled data. But for real data there is some scatter of values, which is not present when filling in the gaps based on linear regression. As a consequence of this shortcoming, the variation of the characteristic values becomes smaller, and the correlation between the variable #2 and the variable #1 is artificially amplified. As a result of using this method, the gaps are filled in the worse, the higher the variation of the values of the variable in which the gaps are filled in. To overcome this shortcoming in most cases use either the method of stochastic linear regression. The stochastic linear regression model reflects not only the linear relationship between variables, but also the deviation from this linear relationship. This method has the advantage of filling in the gaps over the linear regression method and, moreover, does not distort the values of the correlation coefficients so much. The LOCF method is also used, as a rule, when filling the gaps in time series, when the following values are a priori strongly interrelated with the previous ones. The LOCF (last observation carried forward ) strategy is to replace each skipped value with a previous, missed skip, observation (or the next missed skip if the variable column starts with a skip). When using this approach, it is important to remember that the use of LOCF may lead to duplication of emissions (filling gaps with anomalous values). In addition, if the data has a lot of consistently omitted values, the hypothesis of small changes is no longer fulfilled and, as a consequence, the use of LOCF leads to incorrect results. Of all the considered methods of forming a data set without gaps, filling in the gaps with stochastic linear regression and the LOCF method in the General
174
P. Bidyuk et al.
case leads to the least distortion of the statistical properties of the sample. And in the case when clearly expressed linear dependence are traced between the characteristics, the methods of stochastic linear regression are often superior to even more complex methods. If the filling of gaps is performed using a regression model, for example, AR(p), it makes sense to analyze the quality of the model, which will be built on a sample of filled gaps [4–6]. To do this, it is possible to use the following criterion: J = |1 − R2 | + |2 − DW | + βln(M AP E) −→ min . θˆi
Or in a little more complicated form: N 2 2 J = |1 − R | + αln e (k) + |2 − DW | + βln(M AP E) + U −→ min, k=1
θˆi
N N 2 where R2 is the coefficient of determination; k=1 e2 (k) = k=1 [y(k) − yˆ(k)] is the sum of squares of errors at the output of the model; DW is Darbin-Watson statistics; M AP E is mean absolute percentage error for one-step forecasts; U is Tayle coefficient, which characterizes the predictive ability of the model; α, β are weights that are easy to pick up; θˆi is vector of parameters of the candidate model. The combined criterion of this type is convenient for the automated selection of the best model. It simultaneously analyzes the adequacy of the model and its predictive ability. Measurement gaps can be filled in with the data (in a series of successively different methods), built, for example, from several types of distributions, and a comprehensive criterion will tell which filling option is the best one.
3
Conclusions
The approach to identifying and filling gaps in data sets provides a systematic use of a three-step procedure for data analysis. The procedure provides identification of gaps in the data, identification of patterns of gaps based on three methods: matrix, graphical and correlation, and the stage of formation of a set of data without gaps. The first stage of the procedure is the identification of missing data. In the second stage, studies of the patterns of occurrence of missing values, as well as determine what type of omissions (MCAR, MAR, MNAR). At the last stage, various methods of data generation without gaps are used. The methods used include deleting part of the data with gaps, various replacement methods, methods for predicting missing values. Methods of eliminating gaps in time series are considered separately. High quality of the procedure is achieved by applying statistical criteria for the quality of verification of data sets. Evaluation of the quality of the procedure for filling in the gaps is due to the definition of the target function for optimization. The function is based on combined statistical criteria. Combined criteria of this type allow automated selection of the best model. Such criteria simultaneously analyze the adequacy
An Approach to Identifying and Filling Data Gaps
175
of the model and assess its predictive power. The presented approach can be implemented in machine learning projects that work on the basis of large data sets with gaps. This approach systematically uses various methods to identify and bridge data gaps. The use of ideologically different approaches provides a good basis for combining different types of methods to achieve the best results.
References 1. Altham, P.: Introduction to Statistical Modelling in R. University of Cambridge, UK (2012) 2. Babichev, S., Durnyak, B., Zhydetskyy, V., Pikh, I., Senkivskyy, V.: Application of optics density-based clustering algorithm using inductive methods of complex system analysis. In: IEEE 2019 14th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2019 - Proceedings, pp. 169–172 (2019). https://doi.org/10.1109/STC-CSIT.2019.8929869 ˇ 3. Babichev, S., Skvor, J.: Technique of gene expression profiles extraction based on the complex use of clustering and classification methods. Diagnostics 10(8), 584 (2020). https://doi.org/10.3390/diagnostics10080584 4. Bidyuk, P., Gozhyj, A., Kalinina, I., Gozhyj, V.: Analysis of uncertainty types for model building and forecasting dynamic processes. In: Conference on Computer Science and Information Technologies. Advances in Intelligent Systems and Computing II, vol. 689, pp. 66–78. Springer-Verlag (2017). https://doi.org/10.1007/ 978-3-319-70581-1 5. Bidyuk, P., Gozhyj, A., Kalinina, I., Vysotska, V.: Methods for forecasting nonlinear non-stationary processes in machine learning. In: Data Stream Mining and Processing. DSMP 2020. Communications in Computer and Information Science, vol. 1158, pp. 470–485. Springer, Cham (2020). https://doi.org/10.1007/978-3-03061656-4 32 6. Bidyuk, P., Gozhyj, A., Matsuki, Y., Kuznetsova, N., Kalinina, I.: Modeling and forecasting economic and financial processes using combined adaptive models. In: Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2020, vol. 1246, pp. 395–408. Springer, Cham (2021). https://doi.org/10.1007/978-3-03054215-3 25 7. Chernick, M., LaBudde, R.: An Introduction to Bootstrap Methods with Applications to R. Wiley (2011) 8. Cryer, J., Chan, K.S.: Time Series Analysis With Applications in R. Springer, Berlin, Germany (2008) 9. Everitt, B., Hothorn, T.: A Handbook of Statistical Analyses Using R. Chapman, Hall/CRC, Boca Raton (2010) 10. Fox, J., Weisberg, S.: An R Companion to Applied Regression. Sage Publications, Thousand Oaks (2011) 11. Kabacoff, R.: R in Action: Data Analysis and Graphics With R. Manning Publications (2011) 12. Karahalios, A., Baglietto, L., Carlin, J., English, D., J.A., S.: A review of the reporting and handling of missing data in cohort studies with repeated assessment of exposure measures (2012) 13. Knol, M.J., et al.: Unpredictable bias when using the missing indicator method or complete case analysis for missing confounder values: an empirical example. J. Clin. Epidemiol. 63, 728–736 (2010). https://doi.org/10.1016/j.jclinepi.2009.08. 028
176
P. Bidyuk et al.
14. Lam, L.: An Introduction to R. Vrije Universiteit Amsterdam (2010) 15. Little, R., Rubin, D.: Statistical analysis with missing data. Wiley, Online Library (2014) 16. Molenberghs, G., Kenward, M.G.: Missing Data in Clinical Studies. John Wiley and Sons, Chichester, UK (2007) 17. Shumway, R.H., Stoffer, D.: Time Series Analysis and its Applications with R Examples. Hardcover (2006) 18. VanBuuren, S.: Flexible Imputation of Missing Data. Chapman and Hall/CRC, Boca Raton (2012) 19. Venables, W., Smith, D.: An Introduction to R. R Development Core Team (2014)
Information Technology for Assessing the Situation in Energy-Active Facilities by the Operator of an Automated Control System During Data Sampling Lubomyr Sikora , Natalya Lysa , Roman Martsyshyn , and Yuliya Miyushkovych(B) Lviv Polytechnic National University, Lviv, Ukraine {liubomyr.s.sikora,nataliia.k.lysa,roman.s.martsyshyn, yuliia.h.miiushkovych}@lpnu.ua
Abstract. The use of informative data in integrated intelligent automated systems of administrative-organisational management is associated with the analysis of problem situations, selection of options for targeted solutions in normal and crisis situations (which arise in both internal and external structures in the process of their interaction). The proposed article considers a method of building an information technology for assessing the level of perception of images of dynamic situations by operational personnel. Images of dynamic situations received from the measuring systems included in the ACS complex. It is justified by a cognitive model of information perception (images of the situation) in the field of attention of the ACS operator under threatening and limiting modes of operation. This makes it possible to increase the efficiency of the system of measuring devices of data selection from objects. In order to solve the problem, the quality of the measurement transformation of the control object state parameters was evaluated. An information-functional scheme for organising data sampling from aggregates of technogenic systems was justified and developed. Keywords: Data · System · Measurements Risk · Cognitive model · Situation
1
· Sensors · Intelligence ·
Introduction
At the present stage of production development, the load on technological objects and units that operate in the maximum mode increases sharply. This, accordingly, complicates the management of such energy-active resource-intensive facilities. The reason for this is that when disturbances are applied to a potentially dangerous object and an automated control system (ACS), it (a potentially dangerous object) can easily go beyond the permissible values of the state parameters and enter the emergency zone. This can lead to a loss of controllability, c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 177–187, 2022. https://doi.org/10.1007/978-3-030-82014-5_12
178
L. Sikora et al.
requires new methods of control strategies synthesis [9] (both at the technical and administrative levels). An effective approach to solving such a problem is based on the concept of intelligent management. The approach includes the following components of information technology: – reception and processing of data from objects by the information and measurement system (with subsequent intelligent processing); – forming images of the situation in the object state space; – discrimination and classification of these situation images based on decisionmaking theory; – tracing the system trajectory and predicting possible situations. Based on this approach to the synthesis of control systems, the concept of an active goal-oriented system is created that is capable of realising a given goal according to the situation.
2
Problem Statement
For management decision-making in complex technogenic systems (with a hierarchical structure), an important component of its system and information preparation is the formation of procedures for the selection and processing of analogue and digital data. This data is obtained from the units and blocks in the process of operation. The management process for each situation depends on the actions performed by the operator (when exposed to distorted data and cognitive misinformation) in assessing the state of the object. Coherence between informational, managerial and cognitive components in the management process is an urgent problem. The purpose of the research is to develop an information system to eliminate the risks of accidents in the event of erroneous estimates of the parameters of situations in control facilities under the influence of threats and information attacks. The objectives of the research (based on information technology and systems analysis) are: – to analyse the literature on the problem task; – to justify the use of intelligent data processing to assess situations in complex energy-active objects; – to justify and develop a method for building targeted crisis management strategies with limited resources.
3
Literature Review
As mentioned in the Introduction section, the problem of incorrect situation assessment by the operator due to cognitive misinformation is a significant problem. Many works have been devoted to the development of various fuzzy information processing methods.
Information Technology for Assessing the Situation
179
In [3,8,21], the methods of fuzzy information processing in decision-making systems using artificial intelligence and system analysis are substantiated. The paper [14,15] substantiates the concept of energy-activity and the problem of control with limited resources in automated systems. In [1,4,18], the decision-making methods were analysed and the concept of purposefulness was formed. The works [5,7,19] substantiate modern management methods in human-machine systems (man-made, organisational, administrative). In [6,17], an analysis of data processing for control tasks was carried out. The work of [10,20] focuses on the problems of the use of systems analysis and information technology to build complex systems with a hierarchical structure. When designing man-machine systems it is important to take into account the psychological characteristics of the person (operator), which is considered in [2,16]. Earlier work by the authors has substantiated and developed a number of remote object monitoring systems (based on laser systems). In [11], the methods of construction of environmental monitoring systems based on laser sensing and cognitive models of decision-making under conditions of risk and conflict are considered. The paper [13] considers the problem of remote diagnostics of vibrating systems. Work [12] is devoted to the problems of creating a system of laser monitoring of technological process in polygraphy. Such systems significantly improve the information of the operators of these systems about the condition of the monitored object. This increases the efficiency of their operation and reduces the likelihood of emergencies.
4
Materials and Methods
In the hierarchy of the technological system, the human operator has the following tasks: – monitoring the dynamic state; – formation of coordinating actions to maintain the target functioning of the system; – control and regulation of technological processes in normal modes and emergency situations. The operator in such systems becomes the integral intelligence unit of the control processor and the reliability of system operation depends on it. A characteristic feature of such systems is the distribution of information load according to target objectives. This requires the development of data streams of different information relevance: identification of characteristic features of system behaviour relative to the target, formation of decisions to coordinate system movement towards the target area. These decision-making processes and procedures increase the mental tension of the operator, which can cause inappropriate risky decisions.
180
L. Sikora et al.
The operator’s perception of analogue and digital data from the control object state monitoring devices has its own peculiarities in assessing their content in the field of human attention. These peculiarities consist in the fact that when analyzing the situation in the object: – numerical data is recorded in memory, but no previous trajectory history is visible (from the readout) (Fn trac); – unclear orientation (as measured by distance to mode boundary lines) (F Δαr); – no trajectory trends can be traced when performing control actions in a short ∗ tracX(t)); terminal time interval (FΔ – when the system enters the limit loads of the technological structure (at maximum capacity), it is not possible to clearly (in a short time interval) determine the permissible distance to the limit state (and the time of transition to the emergency state (t02 − t04 )) of the energy active object; – the indication of maximum values changes the operator’s perception of the data content and puts him under stress because of operator anxiety about the system going into an uncontrollable emergency state (t04 , t05 ) F (HL(LA )). There are some disadvantages to the perception of analogue signals in graphical form. The operator’s ability to predict a trajectory in terminal time is complexed by the associative imagery of the data in the attention field and the scenariobased interpretation of events by the situation classifier (Fig. 1). The trajectory interpretation leads to a distortion of the reading scale values in different intervals of the numerical values of the measurement. When the trajectory enters the marginal areas of the mode - it causes strain on the cognitive system of the ACS operator (when making control decisions (t04 , t05 )).
Fig. 1. Structural diagram of the situation classifier
Figure 1 uses the following notations: {Si } are the system state, DSit(Θ(ti )) is the dynamic situation at time ti by parameter Q, (ALARM ), (EM ER) are the
Information Technology for Assessing the Situation
181
alarm and emergency state, IM S is the information and measurement system, ACS is the automated control system, T S(EICO) is the technological system with an energy-intensive control object, DR – resource source. Determination of critical system parameters and formation of a mode map of permissible states of the energy-active control object is based on the analysis of real load modes and relevant technological norms. This is the basis for the development of a system situation classifier based on the choice of power scale class Sh(P ).
Fig. 2. Diagram of the relationship between the level of team performance and the level of knowledge and cognitive factors
The analysis has shown that the level of risk in the process system depends on the proper operation (reliability) of the units and assemblies. Also important is the quality of monitoring and control systems, data visualisation systems to assess a set of parameters (characterising the mode of operation of the systems). The effectiveness of the team (operators) can only be addressed by considering their level of training and their perception of the data and its interpretation. This is determined by the cognitive, knowledge characteristics of the operator (on which the level of risk depends) (Fig. 2). According to normative tables and vocationally-oriented skills (which are defined by tests), a procedure for forming operational teams with an assessment of a set of factors is constructed (Fig. 3). In order to assess skills, tests are generated from which a composite index of the cognitive factors of a person’s knowledge and skills is determined: I(KFZV ) =
u i=1
KFZri , [i ∈ 1, n]
(1)
182
L. Sikora et al.
An affiliation function is generated for the team from the fuzzy scores (obtained from the intelligence testing process) with the allocated limits of professional aptitude. According to the diagram (Fig. 3), a risk assessment diagram is constructed as follows (Fig. 4).
Fig. 3. Diagram of team factor formation
Fig. 4. Risk assessment diagram as a function of team performance
Information Technology for Assessing the Situation
5
183
Experiment, Results and Discussion
The research has taken into account the results highlighted in the previous works of the authors. Let us carry out risk assessment for energy-active objects by quality parameter (dust concentration, which was measured by using the laser remote sensing system developed by the authors). This parameter is one of the factors of occurrence of an accident at power units (which is the main energy-explosive element). The condition of functioning of the energy-active object during the combustion of coal dust in boilers in hot air flow (combustion thermodynamics) is represented by the expression: if CKV → CKmax then PA (t, Tn ) → Pmax → PEM ER
(2)
Accordingly, hypothesis testing for accidents is of the form represented by the expressions: Kmax ) ⇒ N ORM A (3) H1 : (CKV 0, then the classifier refers the object x to a class +1, and for a(x) ≤ 0 - to a class −1. After all, the bigger, the larger the classifier, which is confident in its choice. The loss function in this case is written as follows: L(y, z) = log (1 + exp −yz) Lz (y, z) = −
(11)
y 1 + exp (yz)
The shift vector s in this case will have the form: ⎛ ⎞ − 1+exp (y1ya1n−1 (x1 )) ⎠ ··· s=⎝ yl − 1+exp (yl an−1 (xl ))
(12)
After calculating the algorithm an (x), you can estimate the probabilities of an object x belonging to each of the classes: P (y = 1|x) = P (y = 1|x) =
3.5
1 exp (−an (x))
(13)
1 exp (an (x))
Description of the Gradient Boosting Algorithm
At the first stage is performed initialization of the composition a0 (x) = b0 (x). 1. The shift vector is calculated (formula 9). 2. The algorithm is constructed using formula 10. 3. A new algorithm is added to the composition. In the second step, the stop criterion is checked. If the stop criterion is not met, then you need to perform another iteration step. And if the stop criterion is met, the iterative process stops.
Application of Ensemble Methods of Strengthening
3.6
195
AdaBoost
One of the popular boosting algorithms is AdaBoost (short for Adaptive Boosting). It was proposed by Robert Shapiro and Joav Freud in 1996. AdaBoost was the basis for all subsequent research in this area. Its main advantages include high speed, as the construction time of the composition is almost entirely determined by the learning time of basic algorithms, ease of implementation, good generalized property, which can be further improved by increasing the number of basic algorithms, and the ability to identify emissions xi, for which in the process of increasing the composition of the weight wi take the largest values. Its disadvantages include the fact that AdaBoost is an algorithm with a convex loss function, so it is sensitive to noise in the data and is prone to overfitting, compared to other algorithms. An example can be seen in Table 1, namely we can draw a column error in the test - it first decreases, and then begins to grow, despite the fact that the error in learning is constantly decreasing. Table 1. AdaBoost error rate on training and test data Number of classifiers Learning error Error checking 1
0.28
0.27
10
0.23
0.24
50
0.19
0.21
100
0.19
0.22
500
0.16
0.25
1000
0.14
0.31
10000
0.11
0.33
Also, the AdaBoost algorithm requires large enough training samples. Other methods of linear correction, in particular, bagging, are able to build algorithms of comparable quality on smaller data samples. 3.7
AdaBoost Technique Procedure
This method of ensembles reinforces “weak” classifiers by uniting them in a committee. It has gained its “adaptability” because each subsequent classifier committee is built on objects that have been misclassified by previous committees. This is because correctly classified objects lose weight, and incorrectly classified objects gain more weight. An example of the algorithm can be explained using Fig. 1. In Box 1, we assign weight levels to all points and use a decision stump to classify them as “pros” or “cons”. This “weak” classifier generated a vertical line on the left (D1) to classify the points. We see that this vertical line incorrectly divided the
196
N. Boyko et al.
Fig. 1. AdaBoost algorithm operation diagram
three “pluses”, classifying them as “minuses”. In this case, we assign these three “pluses” more weight and apply this classifier again. In Box 2 we see that the size of the three incorrectly classified “pluses” is larger than the other points. In this case, the threshold classifier of solutions (D2) will try to predict them correctly. And really: now the vertical line (D2) correctly classified the three incorrectly classified “pluses”. However, this has led to other classification problems - three “minuses” are incorrectly classified. Let’s perform the same operation as in the previous time - assign more weight to incorrectly classified points and apply the classifier again. In Box 3, the three “minuses” have more weight. To correctly distribute these points, we again use the threshold classifier (D3). This time, a horizontal line is formed to classify the “pluses” and “minuses” based on the greater weights of the misclassified observation. In Box 4, we combine the results of classifiers D1, D2, and D3 to form a strong prediction that has a more complex rule than the individual rules of the “weak” classifier. And as a result, in Box 4, we see that the AdaBoost algorithm classified observations much better than any single “weak” classifier.
4
Analytical Consideration of the AdaBoost Algorithm
Consider the problem of binary classification with labels y ∈ {−1; +1}. General view of the basic algorithm h(x) = h(x|γ), h(x) ∈ {−1; +1}, classifier yˆ(x) = N sign{h0 (x) + i=1 ci · hi (x)}, loss function L(h(x), y) = e−y·h(x) . Algorithm: – Entrance: training sample (xi , yi ), i = 1, N ; basic algorithm h(x) ∈ {−1; +1} who studies on a balanced sample; M —- number of iterations. – Initialization of scales: wi = N1 , i = 1, N – For m = 1, 2 . . . , M : • Train hm (x) on the training sample using scales wi , i = 1, N
Application of Ensemble Methods of Strengthening N
• • • •
wi I hm (xi )=yi N i=1 wi
197
Calculate the weighted classification error Em = If Em > 0.5 or Em = 0: stop the procedure m Calculate cm = 12 · ln 1−E Em Increase all weights where the basic algorithm was wrong: wi := wi · e2·cm , i ∈ {i : hm (xi ) = yi } M – Output: the resulting ensemble sign{ m=1 cm · hm (x)}. i=1
It will be interesting to consider the function of the basic classifier. Its actual impact on the final classification of points: 1 1 − T otalError ln (14) 2 T otalError where TotalError is the total number of incorrectly classified points for this training set, divided by the size of our dataset. Let’s plot a for a by changing the TotalError from 0 to 1. Note that when the basic algorithm works well or does not have the wrong classification, it leads to an error of 0 and a relatively high value of a. When the base classifier classifies half of the points correctly and half incorrectly, the value of the alpha function a will be 0, because this classifier is no better than random assumptions with a probability of 50%. And in case the basic classifier constantly gives incorrect results, the value of the alpha function a will become quite negative (Fig. 2). at =
Fig. 2. Graph of change a from the error value
There are two cases of alpha function: 1. The value of a is positive when the predicted and the actual result coincide, i.e. the point was classified correctly. In this case, we reduce the weight of the point because the work is going in the right direction. 2. A value of a is negative when the predicted result does not match the actual result, i.e. the point was misclassified. In this case, it is necessary to increase the weight of the point so that the same erroneous classification is not repeated in the next classification. In both cases, the basic classifiers are dependent on the result of the previous one.
198
5
N. Boyko et al.
Application of AdaBoost Boosting Algorithm
In the paper will use the AdaBoost algorithm to predict the type of irises based on the Iris Dataset, which is in the sklearn datasets module. It contains data on the length of the sepals, the width of the sepals, the length of the petals and the width of the petals. In total, the dataset contains 150 records. On such a fairly simple dataset, you can view the essence of the algorithm and its principle. AdaBoost will classify the data by 4 parameters, and will predict the type of iris (Fig. 3).
Fig. 3. The first 5 records from the dataset
For training and testing, we will divide our dataset in the ratio 70/30, where 70% will be used for training, and the remaining 30% - for evaluation. Then create an AdaBoost classifier with the following structure: – base estimator—basic classifier of machine learning; – n estimators—the number of “weak” classifiers in each iteration; – learning rate—affects the weight of the classifier. In the paper will use the Decision Tree as the “weak” classifier, set the number of n estimators classifiers to 50, and the effect on learning rate weights to 1, which is equal to the default value. Then we adjust the object to the data set for training. This will be our model. We test our model on test data and can observe that the accuracy is 93%, which is a good result. We can consider a matrix of inaccuracies, which will show in the format of the matrix on which classifications the algorithm was wrong. We will also open a PCA ratio chart to find out why the algorithm could have made a mistake. From Fig. 5 we can see that there are no problems with the first class, because it has a clear feature. And the second and third intersect partially, so it is not surprising that the algorithm made a mistake (Fig. 4).
Application of Ensemble Methods of Strengthening
199
Fig. 4. Matrix of inaccuracies
Fig. 5. PCA ratio chart for dataset
6
Conclusions
The paper investigated the AdaBoost algorithm, which is the most popular ensemble boosting method. The first section discussed the main ensemble methods and their advantages and disadvantages, as well as focused on the AdaBoost algorithm: its features, principles of operation and mathematical formulas. In the second section, the AdaBoost algorithm was experimentally applied to the data set and its effectiveness was tested. In conclusion, we can say that the effectiveness of ensemble boosting methods compete with neural networks, but they are somewhat more stable and understandable, which is why they have an advantage in choice. Also, the AdaBoost algorithm can slightly improve the result compared to “strong” classifiers. However, there are cases when even a difference of 3–5% is significant. This can be said in the example of Google in the search for legal information, which uses AdaBoost in search results, as well as Apple, Amazon, Facebook and Microsoft, which also use ensemble methods, simply do not publish their algorithms. They do it because it is financially profitable. And then even a few percent improvement is valuable. That is why this algorithm will be further developed and used.
200
N. Boyko et al.
References 1. Merjildo, D.A.F., Ling, L.L.: Enhancing the performance of adaboost algorithms by introducing a frequency counting factor for weight distribution updating. In: Alvarez, L., Mejail, M., Gomez, L., Jacobo, J. (eds.) CIARP 2012. LNCS, vol. 7441, pp. 527–534. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3642-33275-3 65 2. Boyko, N., Boksho, K.: Application of the naive bayesian classifier in work on sentimental analysis of medical data. In: The 3rd International Conference on Informatics & Data-Driven Medicine (IDDM 2020), V¨ axj¨ a, Sweden, November 19–21, pp. 230–239 (2020) 3. Boyko, N., Mandych, B.: Technologies of object recognition in space for visually impaired people. In: The 3rd International Conference on Informatics & DataDriven Medicine (IDDM 2020), V¨ axj¨ a, Sweden, November 19–21, pp. 338–347 (2020) 4. Boyko, N., Tkachuk, N.: Processing of medical different types of data using hadoop and Java mapreduce. In: The 3rd International Conference on Informatics & DataDriven Medicine (IDDM 2020), V¨ axj¨ a, Sweden, 19–21 November, pp. 405–414 (2020) 5. Breiman, L.: Prediction games and arcing algorithms. Neural Comput. 11(7), 1493–1517 (1999) 6. Brownlee, J.: Boosting and adaboost for machine learning (2016). https:// machinelearningmastery.com/boosting-and-adaboost-for-machine-learning/ 7. Buhlmann, P., Yu, B.: Boosting with the L2 loss: regression and classification. J. Am. Stat. Assoc. 98(462), 324–339 (2003) 8. Friedman, J.: Another approach to polychotomous classification. Mach. Learn. (2004) 9. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 28, 337–407 (2000) 10. Friedman, J.: Greedy function approximation: a gradient boosting machine. Anna. Stat. 29(5) (2001) 11. Kumar, A.: The ultimate guide to adaboost algorithm. What is adaboost algorithm? (2020). https://www.mygreatlearning.com/blog/adaboost-algorithm/ 12. Kyrychenko, Y.: Features of development a system of number plate localization based on the viola-jones method using adaboost. Paradig. Knowl. 2(28) (2018) 13. Lee, Y., Lin, Y., Wahba, G.: Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data. J. Am. Stat. Assoc. 99, 67–81 (2004) 14. Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Artif. Intell. 349(5), 255–260 (2015) 15. Murygin, K.V.: Features of the implementation of the adaboost algorithm for detecting objects in images. Shtuchniy Telecom (3), 573–581 (2009) 16. Zhang, P., Zhang, G., Pan, Q.: An ensemble learning method for classification of multiple-label data. J. Comput. Inf. Syst. 11(2) (2015) 17. Zhu, J., Zou, H., Hastie, T.: Multi-class adaboost. Stat. Interface 2(3), 349–360 (2003)
Synthesis of Barker-Like Codes with Adaptation to Interference Oleg Riznyk , Ivan Tsmots , Roman Martsyshyn , Yuliya Miyushkovych(B) , and Yurii Kynash Lviv Polytechnic National University, Lviv, Ukraine {oleh.y.riznyk,ivan.h.tsmots,roman.s.martsyshyn,yuliia.h.miiushkovych, yurii.y.kynash}@lpnu.ua
Abstract. Interference immunity is one of the important characteristics of data reception/transmission systems. Increasing immunity to interference at fixed transmit/receive rates is a current issue, e.g. for drone control. The investigated Barker-like code (BLC) sequences allow increasing the power of the received sequences due to the use of mirror interferenceresistant code sequences. The increase in data transmission interference is achieved by increasing the length and power of the interference-resistant codec sequence used to transmit a single message. Interference immunity is one of the important characteristics of data reception/transmission systems. Increasing immunity to interference at fixed transmit/receive rates is a current issue, e.g. for drone control. The advantages of these sequences (e.g. high immunity to high power narrow-band interference, code-based subscriber separation, transmission stealth, high resistance to multipath, high resolution in navigation measurements) will have wide practical application in communication and geolocation systems. The paper improves the method of synthesis of interference-resistant BLC sequences using ideal ring beams. An improved method for quickly finding such interference-resistant code sequences that are capable of finding and correcting errors according to the length of the resulting code sequence is considered. Implemented an algorithm for quickly finding such interference-resistant Barker-like coding sequences that are capable of finding and correcting errors according to the length of the resulting code sequence in a large volume. A simulation model of interferenceresistant BL coding using ideal ring bundles is developed. The software implementation of the simulation model of the noise-resistant Barker-like coding (on finding and correcting errors in the obtained noise-resistant BLC sequences) has been carried out. The proposed noise-correcting BLC sequences have practical value, since with the help of the obtained code sequence is quite simple and fast to find (up to 50%) and correct (up to 25%) distorted characters (of the length of the noise-correcting code sequence). Keywords: Barker-like sequence · BLC · Mirror code sequence ring bundle · Non-equidistant code sequence · Non-equidistant combinatorial configuration
· Ideal
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 201–216, 2022. https://doi.org/10.1007/978-3-030-82014-5_14
202
1
O. Riznyk et al.
Introduction and Literature Review
In the military domain (where mobile intelligent robots, drones, microsatellites, various mobile transportation systems and automated weapons control systems are widely used), an important challenge is the organisation of reliable communication between these assets and a remote control centre with the appropriate level of cryptographic protection and noise immunity. Data protection and transmission systems (DPTS) using Barker-like codes are being developed to solve such a problem [2,15,17,18]. The design of on-board DPTS faces the following challenges [11,14,19]: – providing real-time operation; – improving interference immunity (while reducing weight, size, power consumption and cost). Creation of on-board DPTS with high technical and economic performance requires extensive use of modern technologies using ultralarge integrated circuits (UICs), development of new methods, algorithms and UIC structures (focused on effective implementation of data coding-decoding algorithms). The development of methods and tools for interference-free coding using noise-like codes based on Barker-like sequences has been considered in many papers. Real-time coding and decoding algorithms using Barker sequences have been considered in [7,8,27]. Wireless real-time data protection, compression and transmission systems with defined parameters are described in [3,5]. An algorithm for finding and correcting interference based on Barker-like sequences is shown in [9,25,26]. In [10,20], real-time operation for simultaneous radar and spatial protected communications is justified. The development of effective methods and means of real-time data protection have been described in [6] (visual cryptography methods) and [4] (digital watermarks and steganography). The development of methods, algorithms and structures of interference-resistant data coding-decoding and synthesis (based on them) of real-time data protection systems are given in [12,13]. The development of algorithms and methods for noise coding using noise-like codes based on Barker-like sequences has been considered in [16,23,24]. The issue of anomaly detection [9], diagnostic and non-destructive testing methods [1,28] and system state assessment models [21,22] are also important in DPTS. However, these works pay little attention to the application of modern models and algorithms that are used in the development of real-time data protection and transmission systems. In addition, the issue of combining interference coding facilities within the framework of complex systems has not been sufficiently considered.
Synthesis of Barker-Like Codes with Adaptation to Interference
2 2.1
203
Materials and Methods The Theoretical Basis
The analytical model of a system is a list of mathematical equations that describe the properties of BLC and allow us to determine the property of code immunity. In particular, the Barker sequence is given by such a mathematical expression: N
{ai }i=1 |ai ∈ {+1, −1}
(1)
The overall autocorrelation coefficient of the function is defined as: cj =
N −j
ai ai+j
(2)
i=1
And the level of the main lobe is determined by an expression: N ML = ai ai = N
(3)
i=1
To simplify and improve the results of the problem, it is relevant to develop a new approach to the synthesis of codes with good autocorrelation properties (which come close to Barker codes). One of these approaches is based on the synthesis of BLC sequences using ideal ring bundles (IRB). Construction of Barker-Like Code Sequences of Variable Length. A multiple perfect ring bundle is a combinatorial configuration that is formed from a sequence N of integers KN = (k1 , k2 , ..., ki , ...kN ) where all the sums R = SN − 1 are the value R of different ring sums. A ratio combining of 1, 2, .., SN the amount of integers N , the multiplicity R and the sum SN of all R-multiple ideal ring bundles: N (N − 1) +1 (4) SN = R To construct a Barker-like code sequence of variable length (using a IRB with values of order N and multiplicity R), we allocate a string of SN (where the elements of the one-dimensional array are numbered in ascending order and which are filled “1” if the element numbers match the numbers of the IRB). In all other elements of the array that are left blank, we insert “0”. The resulting sequence with “1” and “0” will be an SN -bit Barker-like code sequence of variable length. By cyclically shifting this sequence, the remaining allowed combinations of this sequence can also be obtained. Let us give an example of such a Barkerlike code sequence in the table of code combinations created on the basis of a IRB of order N = 9 and multiplicity R = 4 (Table 1): 1, 1, 1, 2, 2, 5, 1, 3, 3. Any of these SN (SN − 1)/2 different pairs of sequence code combinations contains exactly R with N singular values in the same-name bits (which corresponds to the properties of a IRB). Other N-R elements of one and cyclically
204
O. Riznyk et al.
Table 1. Noise-resistant code sequence based on IRB weights of with n=9, r=4 1 1 1 1 0 1 0 1 0 0 0 0 1 1 0 0 1 0 0 0 1 1 1 1 0 1 0 1 0 0 0 0 1 1 0 0 1 0 0 0 1 1 1 1 0 1 0 1 0 0 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 0 1 0 0 0 0 1 1 0 0 0 1 0 0 1 1 1 1 0 1 0 1 0 0 0 0 1 1 0 0 0 1 0 0 1 1 1 1 0 1 0 1 0 0 0 0 1 1 1 0 0 1 0 0 1 1 1 1 0 1 0 1 0 0 0 0 1 1 1 0 0 1 0 0 1 1 1 1 0 1 0 1 0 0 0 0 0 1 1 0 0 1 0 0 1 1 1 1 0 1 0 1 0 0 0 0 0 1 1 0 0 1 0 0 1 1 1 1 0 1 0 1 0 0 0 0 0 1 1 0 0 1 0 0 1 1 1 1 0 1 0 1 0 0 0 0 0 1 1 0 0 1 0 0 1 1 1 1 0 1 0 1 1 0 0 0 0 1 1 0 0 1 0 0 1 1 1 1 0 1 0 0 1 0 0 0 0 1 1 0 0 1 0 0 1 1 1 1 0 1 1 0 1 0 0 0 0 1 1 0 0 1 0 0 1 1 1 1 0 0 1 0 1 0 0 0 0 1 1 0 0 1 0 0 1 1 1 1 1 0 1 0 1 0 0 0 0 1 1 0 0 1 0 0 1 1 1 1 1 0 1 0 1 0 0 0 0 1 1 0 0 1 0 0 1 1 1 1 1 0 1 0 1 0 0 0 0 1 1 0 0 1 0 0 1
shifted other Barker-like code sequence are different from the elements located in the same-name bits of the sequence. Then the minimum code distance of a given Barker-like code sequence corresponds to the formula: dmin = 2(N − R)
(5)
The number of errors that are detected by t1 and the number of errors that are corrected by t2 using a Barker-like code sequence are determined based on the minimum code distance: t1 ≤ dmin − 1, t2 ≤ (t1 − 1)/2
(6)
Here are the formulas defining the number of errors that can be detected t1 and the number of errors that can be corrected t2 by a Barker-like code sequence: t1 ≤ 2(N − R) − 1, t2 ≤ N − R − 1
(7)
The code distance can be determined via the parameters of the IRB: d1,2 = SN − 2(N − R)
(8)
Let’s present formulas for determining the number of errors that can be detected or corrected based on a Barker-like code sequence:
Synthesis of Barker-Like Codes with Adaptation to Interference
t1 ≤ 2(N − R) − 1 t2 ≤ N − R − 1
205
, when SN ≥ 4(N − R)
t1 ≤ SN − 2(N − R) − 1 t2 ≤ (SN − 2(N − R + 1))/2
(9)
, when SN < 4(N − R)
(10)
In the above cases, the values of parameters N i R are not related to each other by any dependence and are chosen arbitrarily. In this case, the problem arises of determining the best ratio between N i R. With this best ratio between N i R, the considered Barker-like code sequence acquires additional noise immunity properties. The interference immunity of a Barker-like code sequence increases as the P = N − R difference increases. The highest value of the difference P is achieved for fulfilling the condition: SN = 2N
(11)
The ratio between the selected parameters N i R when a Barker-like code sequence is capable of detecting and correcting the greatest number of errors: N/2, N – even number; P = (12) (N − 1)/2, N – odd number. Variable-length Barker-like codec sequences constructed on the basis of a IRB allow us the following: – detect up to N − 1 errors (corresponding to 50% of the code sequence length) or correct up to N/2 − 1 errors (corresponding to 25% of the code sequence length) for even values of N ; – detect up to N (equal to 50% of the codec length) or correct up to (N − 1)/2 errors (equal to 25% of the codec length) for odd-numbered N values. 2.2
Designing a Barker-Like Code Synthesis System
A software product has been developed to automatically search for Barker-like sequences (codes) of a user-defined order. The conceptual model of this software product consists of three main components: – user interface; – thread manager; – algorithmic Barker-like code search engine. Figure 1 shows the conceptual model of a software product with its components and a description of its interaction.
206
O. Riznyk et al.
Fig. 1. Conceptual model of the system
Method of Constructing Barker-Like Code Sequences Based on Ideal Ring Bundles. The methodology for constructing Barker-like code sequences of variable length on the basis of a IRB using the criterion of the minimum of the autocorrelation function is as follows: – to select the required IRB variant by length LN of Barker-like code sequence, which corresponds to the sum of IRB SN elements with order N and multiplicity R; – construct LN -position code µi , i = 1, 2, ..., LN , with minimum autocorrelation function based on the selected variant of IRB (k1 , k2 , ..., kl , ..., kN ), where on N positions of code sequence with numbers of elements xl , l = 1, 2, ..., N l (which are calculated according to formula xl = 1 + i=1 ki (modLN )) place value “1”, and on other LN − N positions of code sequence place value “0”. Selection, Development of Methods and Algorithms for the Software Product Operation. The software is implemented using the C programming language and some libraries for additional functions, e.g. execution time distribution using parallel threads. Restrictions. In general, the software product searches for Barker codes of up to 64 lengths. There are two reasons for this limitation:
Synthesis of Barker-Like Codes with Adaptation to Interference
207
1. To speed up the calculations, the program’s algorithm works with bit number fields. Thus, instead of using arrays of data, the program stores the codes in a single variable; 2. The search speed for codes of order 30 and above is a rather slow process, the complexity of which grows geometrically (x2,1). To simplify such work, it is necessary to limit the range of available array lengths. Procedure. The program tries the codes of a given order in a certain range of numbers. To simplify the search, and using the empirical property of Barker-like codes, we check only one half of the range, because the second half contains the same, but bitwise reversed elements. For example, let’s carry out a check on Barker-like codes of the sixth order (Fig. 2). Among the list of all codes with the lowest level of side lobes we mark with “+” unique codes, and with “−” those which are fashionable to create from already known unique codes.
Fig. 2. Check result on Barker codes of sixth order
Also, a number of code optimisations have been carried out to speed up the search for Barker-like codes and to move its infrastructure to a multi-threaded
208
O. Riznyk et al.
Fig. 3. The software product flowchart
environment. The overall structure of the software product contains three main components, each of which also has additional related functionality. A general diagram of the file structure is shown in Fig. 3. The algorithm of the software product is based on a direct search for Barker-like codes, using functions to check and save the code value. The general algorithm of the software is shown in Fig. 4. From the flowchart of the algorithm (Fig. 4) it can be seen that the operation of the program starts with checking that the parameters entered by the user are correct. This procedure is necessary to stop the execution of the program in case of erroneous input data (which will save the user time and error resolution during the execution of the program). The next step is to initialise a mutex. Mutexes are objects by which multiple program threads access a resource in a synchronized manner. Synchronization is an important step in program operation because if it is missing, disordered recording of data about the Barker-like codes found can lead to multiple overwriting of results, their corruption or a program crash. After initialization, the program creates other execution threads that will search for all the codes in the gap. The gap boundaries are set in the thread initialization parameters by the main thread and then passed to each individual thread where they are used to index over the specified code gap. Each thread created by the main process performs a sequential search for the best Barker-like codes. The interval of codes to be searched is determined by the parameters that were passed to this thread at the time of its creation. Each version of the code sequence (which gives the best result) is stored in a local buffer, where it is retained until the search is complete (or until the best code is found). After completing the enumeration of all the codes from a particular set, the thread must return the results to the main process. This uses a global array
Synthesis of Barker-Like Codes with Adaptation to Interference
Fig. 4. Algorithm of the programme
209
210
O. Riznyk et al.
of the best Barker-like codes, which is available to all threads simultaneously. The threads, before writing their results, try to occupy the mutex (to avoid problems with writing to shared memory locations). The thread that succeeds in occupying the mutex is allowed to write to the shared array of best codes. When it has finished writing its results, the thread releases the mutex to the other threads and terminates its execution. The main thread waits for all threads to complete, after which it removes the mutex and begins processing the results. In the total result array, the program looks for unique code sequences (sifting out repetitions). It then creates three copies of the unique Barker-like codes, which are modified versions of the codes themselves. They are created using reflection, inversion, and both operations simultaneously. Thus, the program gets all possible variations of the unique Barker-like codes, among which it again looks for repetitions and rejects them. At the end of the execution we have a set of completely unique Barker-like codes, which the program outputs in an easy-to-view form and with all auxiliary data.
3
Experiment, Results and Discussion
The result of the program described in the previous section is a list of unique Barker-like codes with the highest data protection strength. Figure 5 shows examples of the execution result for 6, 13 and 14 sequence codes. “0” stands for 1 and 1 stands for 1. Figure 5 shows that depending on the length of the required code sequence, we will get a different number of the best code combinations. The resulting codes are sorted in descending order of binary value, and the format of the output data allows for easy processing and application in further research. To further analyse the results, we construct graphs to give a better understanding of the statistics. The graphs in Fig. 6 and Fig. 7 show the growth of computation time as a function of the number of elements in the sequence being searched for. In Fig. 6 the graph has an absolute vertical axis, which is not always convenient for general analysis. In Fig. 7 the graph is based on exponential scale it allows to see better the relation between different dimensions of configurations. Figure 8 shows the change of program running speed without and with the use of multithreading. After transition of the code architecture from single-threaded to multi-threaded type, the running time of the program decreased by four times. The consequence of this is that the code combinations can be searched quickly and more results can be obtained in the same amount of time. Figure 9 shows the number of unique elements found for each degree of the array. 6 shows the number of unique elements found for each of the array degrees. Plot of sequential and parallel search times against the size of the sequence 3.1
Testing
To check the correct operation of the program, known existing results of Barkerlike codes were tested.
Synthesis of Barker-Like Codes with Adaptation to Interference
Fig. 5. Code search result for sequences of 6, 13 and 14 digits
Fig. 6. Plot of search time versus sequence size
211
212
O. Riznyk et al.
Fig. 7. Plot of search time versus sequence size (logarithmic scale)
Fig. 8. Plot of sequential and parallel search times against the size of the sequence
The results of the program matched all variants of known existing Barker codes with the synthesis results. The test results are shown in Table 2, where: – N – the length of the Barker-like code – g2,N – number of Barker-like code variants; – pi,j – variants of the obtained Barker-like codes.
Synthesis of Barker-Like Codes with Adaptation to Interference
Fig. 9. Plot of the number of BLCs found against the length of the sequence
Table 2. Test results of the Barker-like code synthesis programme
213
214
O. Riznyk et al.
The first column in Table 2 shows the number of elements in the Barker-like code in ascending order, the second column shows the number of Barker-like codes obtained, and the third column shows examples of some Barker-like codes Pi,j to illustrate the material presented earlier.
4
Conclusions
Today, code sequences that have long been known and studied in detail are mainly used. At the same time, improving the noise immunity and efficiency of reception/transmission systems is only possible by using more advanced sequences and signals based on them. Test results of the Barker-like code synthesis programme. As a result of this research, the problem of constructing Barker-like code sequences of variable length with the minimum level of autocorrelation function and the best signal to noise ratio was solved. The method of constructing Barkerlike code sequences of variable length using ideal ring bindings is justified. This makes it possible to simplify their finding by using a regular technique for their construction.
References 1. Ahmad, J., Akula, A., Mulaveesala, R., Sardana, H.K.: Barker-coded thermal wave imaging for non-destructive testing and evaluation of steel material. IEEE Sensors J. 19(2), 735–742 (2019). https://doi.org/10.1109/JSEN.2018.2877726 2. Aljalai, A.M.N., Feng, C., Leung, V.C.M., Ward, R.: Improving the energy efficiency of DFT-S-OFDM in uplink massive MIMO with barker codes. In: 2020 International Conference on Computing, Networking and Communications (ICNC), pp. 731–735 (2020). https://doi.org/10.1109/ICNC47757.2020.9049829 3. Babichev, S., Sharko, O., Sharko, A., Mikhalyov, O.: Soft filtering of acoustic emission signals based on the complex use of huang transform and wavelet analysis. In: Lytvynenko, V., Babichev, S., W´ ojcik, W., Vynokurova, O., Vyshemyrskaya, S., Radetskaya, S. (eds.) Lecture Notes in Computational Intelligence and Decision Making, pp. 3–19. Springer, Cham (2020). https://doi.org/10.1007/978-3-03026474-1 1 4. Bender, W., Gruhl, D., Morimoto, N., Lu, A.: Techniques for data hiding. IBM Syst. J. 35(3.4), 313–336 (1996). https://doi.org/10.1147/sj.353.0313 5. Fu, J., Ning, G.: Barker coded excitation using pseudo chirp carrier with pulse compression filter for ultrasound imaging. In: BIBE 2018; International Conference on Biological Information and Biomedical Engineering, pp. 1–5 (2018) 6. Jiao, S., Feng, J., Gao, Y., Lei, T., Yuan, X.: Visual cryptography in single-pixel imaging. Opt. Express 28(5), 7301–7313 (2020) 7. Kellman, M., Rivest, F., Pechacek, A., Sohn, L., Lustig, M.: Barker-coded nodepore resistive pulse sensing with built-in coincidence correction. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1053–1057 (2017). https://doi.org/10.1109/ICASSP.2017.7952317
Synthesis of Barker-Like Codes with Adaptation to Interference
215
8. K¨ onig, S., Schmidt, M., Hoene, C.: Precise time of flight measurements in ieee 802.11 networks by cross-correlating the sampled signal with a continuous barker code. In: The 7th IEEE International Conference on Mobile Ad-hoc and Sensor Systems (IEEE MASS 2010), pp. 642–649 (2010). https://doi.org/10.1109/MASS. 2010.5663785 9. Lakshmi, R., Trivikramarao, D., Subhani, S., Ghali, V.S.: Barker coded thermal wave imaging for anomaly detection. In: 2018 Conference on Signal Processing And Communication Engineering Systems (SPACES), pp. 198–201 (2018). https://doi. org/10.1109/SPACES.2018.8316345 10. Matsuyuki, S., Tsuneda, A.: A study on aperiodic auto-correlation properties of concatenated codes by barker sequences and NFSR sequences. In: 2018 International Conference on Information and Communication Technology Convergence (ICTC), pp. 664–666 (2018). https://doi.org/10.1109/ICTC.2018.8539367 11. Omar, S., Kassem, F., Mitri, R., Hijazi, H., Saleh, M.: A novel barker code algorithm for resolving range ambiguity in high PRF radars. In: 2015 European Radar Conference (EuRAD), pp. 81–84 (2015). https://doi.org/10.1109/EuRAD.2015. 7346242 12. Palagin, A., Opanasenko, V.: Reconfigurable-computing technology. Cybern. Syst. Anal. 43, 675–686 (2007). https://doi.org/10.1007/s10559-007-0093-z 13. Palagin, A., Opanasenko, V., Krivoi, S.: The structure of FPGA-based cyclic-code converters. Optical Memory Neural Netw. 22(4), 207–216 (2013). https://doi.org/ 10.3103/S1060992X13040024 14. Kim, P., Jung, E., Bae, S., Kim, K., Song, T.K.: Barker-sequence-modulated golay coded excitation technique for ultrasound imaging. In: 2016 IEEE International Ultrasonics Symposium (IUS), pp. 1–4 (2016). https://doi.org/10.1109/ULTSYM. 2016.7728737 15. Riznyk, O., Povshuk, O., Kynash, Y., Nazarkevich, M., Yurchak, I.: Synthesis of non-equidistant location of sensors in sensor network. In: 2018 XIV-th International Conference on Perspective Technologies and Methods in MEMS Design (MEMSTECH), pp. 204–208 (2018). https://doi.org/10.1109/MEMSTECH.2018. 8365734 16. Riznyk, O., Povshuk, O., Kynash, Y., Yurchak, I.: Composing method of antiinterference codes based on non-equidistant structures. In: 2017 XIIIth International Conference on Perspective Technologies and Methods in MEMS Design (MEMSTECH), pp. 15–17 (2017). https://doi.org/10.1109/MEMSTECH.2017. 7937522 17. Riznyk, O., Povshuk, O., Noga, Y., Kynash, Y.: Transformation of information based on noisy codes. In: 2018 IEEE Second International Conference on Data Stream Mining Processing (DSMP), pp. 162–165 (2018). https://doi.org/10.1109/ DSMP.2018.8478509 18. Riznyk, O., Myaus, O., Kynash, Y., Martsyshyn, R., Miyushkovych, Y.: Noise-resistant non-equidistant data conversion. In: Babichev, S., Peleshko, D., Vynokurova, O. (eds.) Data Stream Mining & Processing, pp. 127–139. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61656-4 8 19. Rodriguez-Garcia, P., Ledford, G., Baylis, C., Marks, R.J.: Real-time synthesis approach for simultaneous radar and spatially secure communications from a common phased array. In: 2019 IEEE Radio and Wireless Symposium (RWS), pp. 1–4 (2019). https://doi.org/10.1109/RWS.2019.8714503
216
O. Riznyk et al.
20. Rosli, S.J., Rahim, H., Ngadiran, R., Rani, K.A., Ahmad, M.I., Hoon, W.F.: Design of binary coded pulse trains with good autocorrelation properties for radar communications. In: MATEC Web of Conferences 150 (2018). https://doi.org/10.1051/ matecconf/201815006016 21. Sikora, L., Lysa, N., Martsyshyn, R., Miyushkovych, Y.: Models of combining measuring and information systems for evaluation condition parameters of energyactive systems. In: 2016 IEEE First International Conference on Data Stream Mining Processing (DSMP), pp. 290–294 (2016). https://doi.org/10.1109/DSMP. 2016.7583561 22. Sikora, L., Martsyshyn, R., Miyushkovych, Y., Lysa, N.: Methods of information and system technologies for diagnosis of vibrating processes. In: 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), vol. 1, pp. 192–195 (2017). https://doi.org/10.1109/STCCSIT.2017.8098766 23. Tsmots, I., Rabyk, V., Riznyk, O., Kynash, Y.: Method of synthesis and practical realization of quasi-barker codes. In: 2019 IEEE 14th International Conference on Computer Sciences and Information Technologies (CSIT), vol. 2, pp. 76–79 (2019). https://doi.org/10.1109/STC-CSIT.2019.8929882 24. Tsmots, I., Riznyk, O., Rabyk, V., Kynash, Y., Kustra, N., Logoida, M.: Implementation of FPGA-based barker’s-like codes. In: Lytvynenko, V., Babichev, S., W´ ojcik, W., Vynokurova, O., Vyshemyrskaya, S., Radetskaya, S. (eds.) Lecture Notes in Computational Intelligence and Decision Making, pp. 203–214. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-26474-1 15 25. Vienneau, E., Byram, B.: Compound barker-coded excitation for increased signalto-noise ratio and penetration depth in transcranial ultrasound imaging. In: 2020 IEEE International Ultrasonics Symposium (IUS), pp. 1–4 (2020). https://doi.org/ 10.1109/IUS46767.2020.9251650 26. Wang, M., Cong, S., Zhang, S.: Pseudo chirp-barker-golay coded excitation in ultrasound imaging. In: 2018 Chinese Control And Decision Conference (CCDC), pp. 4035–4039 (2018). https://doi.org/10.1109/CCDC.2018.8407824 27. Wang, S., He, P.: Research on low intercepting radar waveform based on LFM and barker code composite modulation. In: 2018 International Conference on Sensor Networks and Signal Processing (SNSP), pp. 297–301 (2018). https://doi.org/10. 1109/SNSP.2018.00064 28. Sheng, X.I.A., Li, Z.P., Jiang, C.L., Wang, S.J., Wang, K.C.: Application of pulse compression technology in electromagnetic ultrasonic thickness measurement. In: 2018 IEEE Far East NDT New Technology Application Forum (FENDT), pp. 37– 41 (2018). https://doi.org/10.1109/FENDT.2018.8681975
Models of Factors of the Design Process of Reference and Encyclopedic Book Editions Vsevolod Senkivskyy1 , Iryna Pikh1,2 , Alona Kudriashova1(B) , Nataliia Senkivska1 , and Lyubov Tupychak1 1
2
Ukrainian Academy of Printing, 19, Pid Holoskom Street, Lviv, Ukraine Lviv Polytechnic National University, 12, Stepana Bandery Street, Lviv, Ukraine
Abstract. Reference and encyclopedic book editions are characterized by complex technical and semantic features. Such editions contain articles that answer the predicted questions of potential consumers in a concise form and have semantic and compositional integrity, and text elements are characterized by a clearly defined structure, which is provided by the use of headings, sections, subheadings, etc. It is difficult to imagine the process of cognition of a particular subject area without the use of reference and encyclopedic editions. The challenges facing the humanity require immediate reactions, which is why the lion’s share of reference and encyclopedic editions are now presented in electronic form. This transformation has not solved the design problem, but on the contrary, deepened it, increasing the variability of the necessary parameters selection. Reference and encyclopedic editions have many structural differences from other types of editions (art, popular science, etc.), the identification with which is unacceptable in terms of quality assurance. In view of the above, based on expert decisions, a set of factors of influence on the design process of reference and encyclopedic book editions has been singled out, a semantic network has been constructed and described, priority levels of factors have been established, a model of priority influence of factors on the quality of the analysed technological process has been constructed and optimized. The means of graph and network theory, elements of predicate logic, iterative methods, means of the hierarchical systems theory, methods of multicriteria optimization and pairwise comparisons have been used. Keywords: Reference and encyclopedic book edition · Quality Factor · Semantic network · Priority · Model · Optimization
1
·
Introduction and Literature Review
The variety of book products can meet the needs of the most demanding readers, but despite the large selection of formats and thematic areas, in conditions of great competition the desire to obtain the highest quality of finished products c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 217–229, 2022. https://doi.org/10.1007/978-3-030-82014-5_15
218
V. Senkivskyy et al.
remains unchanged, because quality is what the buyer pays attention to first of all [18]. Ensuring readability, ease of perception of information and correct reproduction is the task facing all publishers, so the design of books requires meaningful assessment, detailed analysis of all factors of influence and development of a model of priority action of factors on the design process [16,17]. Inconsistency, chaos and insufficient theoretical knowledge of the designer can lead to non-compliance of the project with the norms and requirements, and focusing solely on the external attractiveness of the edition can lead to the deterioration of reader characteristics. This pattern refers to both printed books and electronic editions. The issue of the design process of reference and encyclopedic editions is especially acute, because they are usually characterized by multi-column typing, medium or high text complexity and one-time reading time to obtain the necessary information that requires easy navigation [6]. Reference and encyclopedic editions include dictionaries, reference books and encyclopedias [18]. Therefore, the defining requirement is the speed of the information retrieval, which depends on the reader’s adaptation, the duration of which can be minimized with the rational selection of the edition parameters. The analysis of literature sources indicates that a significant number of works is devoted to the visual perception of books by providing rational design solutions [3,7,14], in particular, the attention is paid to the compositional design [17,19] and colour characteristics [11], which is appropriate for reference and encyclopedic editions. The factors of editorial and publishing processing of editions intended for reproduction by printing method [10,12,20] and electronic editions are assessed [2,13]. Problems of preparation for printing [23] and direct printing of circulations [1,5,8] are described. The mechanisms of organization of publishing business [4,9] distribution [22] and advertising [21] of books are defined. The emphasis is placed on the uniqueness and the need for reference and encyclopedic editions [15]. At the same time, the issues of forecasting assessment of the quality of preparation and publication of such books, the elements of information technology of which are the separation and hierarchical modelling of factors influencing the quality, are not considered. That is why it is extremely important to study the category of reference and encyclopedic editions and the factors related to them as a separate set, different from the set of factors influencing the quality of books as a whole. In addition, it should be noted that the analysed literature sources have become the theoretical basis for the formation of the research direction, in particular, the statements made in the publications have been taken into account by experts in forming a set of factors influencing the quality of the design process of reference and encyclopedic books. For example, much attention is paid to the factors that shape the design, composition and colour of the book, because, according to a survey presented in [7], design is very important for 16% of respondents, and it is really important for 48%. According to [3], the key role in the preparation of both electronic and printed editions is played by the format and design of the page layout within the selected format. The source [14] emphasizes
Models of Factors of the Design Process of REB Editions
219
the importance of visual presentation of the information in editions created for educational and cognitive purposes and visual communication with the reader through design. The work [17] describes in detail the factors of implementation of the editorial and publishing process of book editions in general, so it is the basis for identifying factors of the studied subject area, and [19] describes the influence of variability of artistic (compositional) and technical design parameters on the finished products quality, which contributes to the establishment of links between the factors of the design process of reference and encyclopedic book editions. Similarly, other analysed literature sources have made it possible to form relationships between factors.
2
Materials and Methods
The primary stage of the study is to identify factors influencing the design quality of reference and encyclopedic books. On the basis of expert judgments, a set of S = {S1 , S2 , ..., Sn } is formed, which contains the most significant factors. Representatives of the scientific community and practitioners are involved to single out the characteristic factors of the process. The connections between the identified factors necessary to form the basis for further description of the subject area, quantitative assessment of their weight values and, accordingly, the dominance establishment are determined and visualized on the basis of the graph theory and semantic networks [16,17]. The nodes of the semantic network are the elements of the set S, and the arcs are functional connections with certain semantic loads (Si , Sj ). The formalized description of relations between factors by means of elements of predicate logic is realised: ∧ is logical “and”; ← is “if”; ∀ is a community quantifier; ∃ is an existence quantifier. To establish the priority levels of factors based on the semantic network, a reachability matrix A is constructed, the binary elements of which are determined by the following rule: 1, if f rom the vertex i one can get to the vertex j (1) Sij = 0, in all other cases The reachability of the vertex Sj = (j = 1, 2, ..., n) relative to the vertex Si = (j = 1, 2, ..., n) is due to the presence of a direct connection. The subset of achievable vertices is denoted as K(Si ). In this case, the vertex Si , for which a reverse reachability from the vertex Sj is possible, will be its predecessor. The set of vertices of the predecessors forms the subset of P (Si ). The intersection of the vertices of the formed subsets H(Si ) = K(Si ) ∩ P (Si ), under the condition P (Si ) = H(Si ), determines the dominance of the factors identified with these vertices and is established by analysing the so-called iterative tables. As a result of performing these operations on the elements of the semantic network, a multilevel model is received that reflects the dominance of the factors on the analysed process [16].
220
V. Senkivskyy et al.
Further research is to use methods of multicriteria optimization and pairwise comparisons to optimize the weight values of factors and the synthesis of an optimized model of the priority influence of factors [16,17]. The optimization process involves the construction of a pairwise comparison matrix of factors using the scale of relative importance of objects according to Saati, and the order of the matrix is determined by the number of analysed factors. The matrix elements are obtained on the basis of pairwise comparison of factors, the previous ranks of which are reflected in the obtained multilevel model, and the scale coefficients. Setting the components values of the main eigenvector S(S1 , S2 , ..., Sn ) are defined as the geometric mean of the elements of each row of the matrix: (2) Si = n bi1 ∗ bi2 ∗ bin i = i, n, where n is a number of factors. The normalization of the components of the main eigenvector Sn is calculated as follows: √ n ai1 ∗ ai2 ∗ ain i = i, n (3) Sin = n √ n ai1 ∗ ai2 ∗ ain i=1 Multiplying the right pairwise comparison matrix by the vector Sn , an assessment of the consistency of the factors weight values and the normalized vector Sn1 is received. Dividing the components of the vector Sn1 into the corresponding components of the vector Sn , the components of eigenvector Sn2 is received. The verification of optimization results is carried out by the criterion of the maximum value of the main eigenvector of the pairwise comparison matrix, the consistency index and the consistency ratio. The criterion of the maximum value of the main eigenvector λmax is defined as the arithmetic mean of components of the vector Sn2 . The value of the consistency index IU is calculated by the formula (4) and compared with the reference values of the consistency index SI. The adequacy of the problem solution is proved by the fulfilment of the inequality IU < 0, 1 × SI. λmax − n (4) n−1 Additionally, the consistency ratio is calculated in the expression: SU = IU/SI. The results of pairwise comparisons can be considered satisfactory if SU ≤ 0, 1 [17]. The analysis and comparison of the source (obtained by assigning the weight values according to the priority levels of factors in the model of priority influence) and normalized vectors of the pairwise comparison matrix allows the synthesis of an optimized model of priority influence of factors on the design process of reference and encyclopedic books. IU =
3
Experiments, Results and Discussions
To conduct the experiment on the basis of expert judgments, the factors influencing the quality of the design process of reference and encyclopedic book editions
Models of Factors of the Design Process of REB Editions
221
are identified and a set is formed S = {S1 , S2 , S3 , S4 , S5 , S6 , S7 , S8 , S9 , S10 }, where S1 is the font, S2 is the edition format, S3 is the font size, S4 is the line length, S5 is the text alignment, S6 is the column number, S7 is the margin size, S8 is the illustration number, S9 is the line spacing, S10 is the typeface. Based on the selected factors, a semantic network is synthesized (Fig. 1).
Fig. 1. Semantic network of factors in the design of reference and encyclopedic book editions
The visual and linguistic representation of the connections between the factors of the analysed technological process is formalized using the elements of predicate language [17]: (∀ Si ) [∃ (S1 , font) ← is selected by (S1 , S2 ) ∧ determines (S1 , S3 ) ∧ determines (S1 , S4 ) ∧ influences the selection (S1 , S5 ) ∧ conditions (S1 , S9 ) ∧ allows the selection (S1 , S10 )]; (∀ Si ) [∃ (S2 , edition format) ← influences the selection (S2 , S1 ) ∧ determines (S2 , S3 ) ∧ forms (S2 , S4 ) ∧ conditions (S2 , S6 ) ∧ determines (S2 , S7 ) ∧ influences the selection (S2 , S9 )]; (∀ Si ) [∃ (S3 , font size) ← is determined by (S3 , S1 ) ∧ is determined by (S3 , S2 ) ∧ forms (S3 , S4 ) ∧ influences the selection (S3 , S5 ) ∧ influences the selection (S3 , S10 )]; (∀ Si ) [∃ (S4 , line length) ← is determined by (S4 , S1 ) ∧ is formed by (S4 , S2 ) ∧ is formed by (S4 , S3 ) ∧ is formed by (S4 , S5 ) ∧ is formed for each column by (S4 , S6 ) ∧ is formed by (S4 , S10 )];
222
V. Senkivskyy et al.
(∀ Si ) [∃ (S5 , text alignment) ← is selected by (S5 , S1 ) ∧ is selected by (S5 , S3 ) ∧ forms (S5 , S4 )]; (∀ Si ) [∃ (S6 , column number) ← is conditioned by (S6 , S2 ) ∧ forms for each column (S6 , S4 ) ∧ is selected by (S6 , S7 )]; (∀ Si ) [∃ (S7 , margin size) ← is determined by (S7 , S2 ) ∧ influences the selection (S7 , S3 ) ∧ influences the selection (S7 , S6 )]; (∀ Si ) [∃ (S8 , illustration number) ← adjusts (S8 , S7 )]; (∀ Si ) [∃ (S9 , line spacing) ← is conditioned by (S9 , S1 ) ∧ is selected by (S9 , S2 )]; (∀ Si ) [∃ (S10 , typeface) ← is selected by (S10 , S1 ) ∧ is selected by (S10 , S3 ) ∧ forms (S10 , S4 ) ∧ is selected by (S10 , S5 )]; Based on the synthesized semantic network, a reachability matrix is constructed according to the principle (1) [16,17]. For convenience of representation, the matrix is placed in the Table 1, having added the designation of factors. Table 1. Reachability matrix S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S1
1
0
1
1
1
0
0
0
1
1
S2
1
1
1
1
0
1
1
0
1
0
S3
0
0
1
1
1
0
0
0
0
1
S4
0
0
0
1
0
0
0
0
0
0
S5
0
0
0
1
1
0
0
0
0
1
S6
0
0
0
1
0
1
0
0
0
0
S7
0
0
1
0
0
1
1
0
0
0
S8
0
0
0
0
0
0
1
1
0
0
S9
0
0
0
0
0
0
0
0
1
0
S10 0
0
0
1
0
0
0
0
0
1
To further establish the factors priority on the reachability matrix, iterative tables are constructed that will contain four columns, where i is the sequence number of the factor in the set. In this case, to form the column K(Si ) of iterative tables, the data are used presented in the rows of the reachability matrix, and to form the column P (Si ) – the data are used presented in the columns of this matrix. In the column K(Si ) ∩ P (Si ), the factors common to K(Si ) and P (Si ) are presented [16]. The results of the first iteration are shown in Table 2. As a result of the first level of iteration, one can see that factors S2 (edition format) and S8 (illustration number) have the same level of priority, namely the first. Further iterative processes are presented in Tables 3, 4, 5, 6 and 7, consistently removing factors with established priority levels. Taking into account the obtained data, a model of the priority influence of factors on the design quality of reference and encyclopedic book editions
Models of Factors of the Design Process of REB Editions Table 2. The first level of iteration i
K(Si )
P (Si )
K(Si ) ∩ P (Si )
1
1, 3, 4, 5, 9, 10
1, 2
1
2
1, 2, 3, 4, 6, 7, 9 2
2←
3
3, 4, 5, 10
3
4
4
1, 2, 3, 4, 5, 6, 10 4
5
4, 5, 10
1, 3, 5
6
4, 6
2, 6, 7
6
7
3, 6, 7
2, 7, 8
7
8
7, 8
8
8←
9
9
1, 2, 9
9
1, 3, 5, 10
10
10 4, 10
1, 2, 3, 7
5
Table 3. The second level of iteration i
K(Si )
1
1, 3, 4, 5, 9, 10 1
P (Si )
1←
3
3, 4, 5, 10
1, 3, 7
3
4
4
1, 3, 4, 5, 6, 10 4
5
4, 5, 10
1, 3, 5
5
6
4, 6
6, 7
6
7
3, 6, 7
7
7←
9
9
1, 9
9
1, 3, 5, 10
10
10 4, 10
K(Si ) ∩ P (Si )
Table 4. The third level of iteration i
K(Si )
3
3, 4, 5, 10 3
4
4
3, 4, 5, 6, 10 4
5
4, 5, 10
3, 5
5
6
4, 6
6
6←
9
9
9
9←
3, 5, 10
10
10 4, 10
P (Si )
K(Si ) ∩ P (Si ) 3←
Table 5. The fourth level of iteration i
K(Si )
P (Si )
4
4
4, 5, 10 4
5
4, 5, 10 5
10 4, 10
5, 10
K(Si ) ∩ P (Si ) 5← 10
223
224
V. Senkivskyy et al. Table 6. The fifth level of iteration i
K(Si ) P (Si ) K(Si ) ∩ P (Si )
4
4
10 4, 10
4, 10
4
10
10 ←
is synthesized (Fig. 2) [16,17]. The edition format and the illustration number are the key parameters of the formation of reference and encyclopedic editions, because they are really regulated by standards and determine the selection of other parameters. The lowest priority level indicates the subordinate nature of the “line length” parameter.
Fig. 2. Model of priority influence of factors on the design quality of reference and encyclopedic book editions
The next stage of the study is to optimize the synthesized model of the priority influence of factors on the quality of the research process, which is to improve the input data by applying a set of measures according to the analytic hierarchy process [17]. To do this, a square, inversely symmetric matrix of pairwise comparison of design factors of the reference and encyclopedic book editions is constructed, using the scale of relative importance of objects (Table 7). At insignificant differences between the factor weights, the experts use even numbers: 2, 4, 6, 8. At more essential differences they use odd numbers: 3, 5, 7, 9. The relation of the factor to itself is always designated as 1, i.e. the diagonal
Models of Factors of the Design Process of REB Editions
225
of the pairwise comparison matrix will be filled with 1. The number of selected factors determines the order of the matrix. The key parameters are singled out this way. The main eigenvector of the pairwise comparison matrix calculated by the expression (2) will look like: S = (2, 126; 3, 597; 1, 112; 0, 176; 0, 495; 0, 968; 1, 851; 3, 131; 0, 843; 0, 283) The normalized vector of the pairwise comparison matrix is according to (3): Sn = (0, 145; 0, 246; 0, 076; 0, 012; 0, 033; 0, 066; 0, 126; 0, 214; 0, 057; 0, 019)
Table 7. Pairwise comparisons matrix S1
S2
S3
S1
1
1/3 3
S2
3
1
4
9
5
S3
1/3 1/4 1
7
3
S4
1/9 1/9 1/7 1
1/5 1/7 1/9 1/9 1/7 1/3
S5
1/5 1/5 1/3 5
1
1/3 1/5 1/5 1/3 3
S6
1/3 1/4 1/2 7
3
1
1/3 1/4 2
5
S7
1/2 1/3 3
9
5
3
1
1/3 3
7
S8
3
9
5
4
3
1
4
7
S9
1/3 1/4 1/2 7
3
1/2 1/3 1/4 1
5
1/2 4
S4 S5
S6
S7
S8
9
3
2
1/3 3
4
3
2
4
7
2
1/3 1/4 2
5
S10 1/7 1/7 1/5 3
5
S9
S10 7
1/3 1/5 1/7 1/7 1/5 1
For convenient visualization of data in the following stages of the study, the normalized vector is adapted by multiplying all its components by a random coefficient k. Let k = 500. Sn×k = (72, 5; 123; 38; 6; 16, 5; 33; 63; 107; 28, 5; 9, 5) The normalized vector for assessing the consistency of weight values of factors is: Sn1 = (1, 569; 2, 710; 0, 814; 0, 135; 0, 366; 0, 710; 1, 369; 2, 372; 0, 619; 0, 212) The components of the eigenvector of the pairwise comparison matrix are: Sn2 = (10, 760; 10, 990; 10, 670; 11, 190; 10, 790; 10, 690; 10, 790; 11, 050; 10, 700; 10, 910)
After the calculations one gets: λmax = 10, 859; the consistency index: IU = 0, 095; the consistency ratio: SU = 0, 064. The adequacy of the problem solution is confirmed by the inequalities 0, 095 < 0, 1 × 1, 49; 0, 064 ≤ 0, 1.
226
V. Senkivskyy et al.
Assigning weight values to factors based on the model of priority influence, a range of such values is received: S4 —20, S10 —40, S5 —60, S3 —80, S6 —80, S9 —80, S1 —100, S7 —100, S2 —120, S8 —120. The obtained numerical values are presented in the form of components of the source vector according to the order of their placement in the matrix: S1 —100, S2 —120, S3 —80, S4 —20, S5 —60, S6 —80, S7 —100, S8 —120, S9 —80, S10 —40. The source vector is received: S0 = (100; 120; 80; 20; 60; 80; 100; 120; 80; 40). The values of the factors S0 , as well as the adapted values Sn×k are entered in the comparative Table 8, the visualization of which is presented in Fig. 3. Table 8. Variants of weight values of factors of design process of reference and encyclopedic book editions i
1
2
3
4
S0
100 120 80 20 60
Sn×k 75,5 123 38 6
5
6
7
8
9
80 100 120 80
16,5 33 63
10 40
107 28,5 9,5
Fig. 3. Comparative graph of component weight values of source and normalized vectors
Having analysed Fig. 3, it becomes obvious that as a result of optimization it has become possible to specify the weight values of factors of the design process of reference and encyclopedic book editions, in particular factors S1 (font) and S7 (margin size); S2 (edition format) and S8 (illustration number); S3 (font size), S6 (column number) and S9 (line spacing), which had the same priority in the source model. On the basis of the received data, the optimized model of priority influence of factors on quality of the design process of reference and encyclopedic book editions is synthesized (Fig. 4). According to the optimized model of the priority influence of factors on the quality of the studied technological process, the most important factor is S2 (edition format), and the tenth (lowest priority level) belongs to factor S4 (line length).
Models of Factors of the Design Process of REB Editions
227
Fig. 4. Optimized model of priority influence of factors on quality of the design process of reference and encyclopedic book editions
4
Conclusions
The analysis of literary sources, in particular publications on related topics, has been done which indicates the lack of research on the problem of the design process of reference and encyclopedic book editions in terms of editorial and publishing processing. In the existing developments, reference and encyclopedic editions are usually considered as a part of a generalized information system that ignores the key differences and uniqueness of such products. The present paper focuses on the elements of information technology for the production of highly specialized editions, which allows further forecasting assessment of the quality of finished printed products and, accordingly, obtaining the final result of high quality. The suggested methodology and research results are to identify and rank the most significant factors influencing the design process of book editions of this type. This approach already at the initial stages provides the formation of a meaningful and orderly algorithm of management actions. In the research process, based on the decisions of the experts involved, a set of factors is formed, which includes the font, the edition format and other. A semantic network that illustrates the relations between factors has been developed and described using the elements of predicate logic. The priority of factors through the formation of
228
V. Senkivskyy et al.
the reachability matrix and the use of iterative procedures is established. The model of priority influence of factors on quality of reference and encyclopedic book editions is synthesized. The optimization of weight values of factors is carried out by methods of multicriteria optimization and pairwise comparisons. An inversely symmetric pairwise comparison matrix using the Saati scale of relative importance of objects is constructed to numerically represent the expert decisions on the importance of factors. The values of the components of the main eigenvector and the normalized vector of the pairwise comparison matrix are calculated. The solution of the problem is checked by control parameters: the criterion of the maximum value of the main eigenvector of the matrix, the consistency index and the consistency ratio. The results obtained before and after the optimization are compared. An optimized multilevel model of the priority influence of factors on the design quality of reference and encyclopedic book editions is constructed. As a result of optimization, it was possible to avoid the same priority of factors that arose in the previous stages of the study.
References 1. Aydemir, C., Yenido˘ gan, S., Karademir, A., Arman, E.: Effects of color mixing components on offset ink and printing process. Mater. Manuf. Processes 32(11), 1310–1315 (2017) 2. Babenko, V.O., Yatsenko, R.M., Migunov, P.D., Salem, A.B.M.: Markhub cloud online editor as a modern web-based book creation tool. In: CEUR Workshop Proceedings, vol. 2643, pp. 174–184 (2020) 3. C ¸ oruh, L., Eraslan, B.A.: The principles of book design for electronic and printed books. Fine Arts 12(2), 105–124 (2017) 4. Davydova, L.: Organizational structures and activities of publishing houses of leading universities of the world. Problem space of modern society: philosophicalcommunicative and pedagogical interpretations, pp. 503–515 (2019) 5. Georgiev, L.: Innovations in the printing communications. Publisher 1, 15–32 (2018) 6. Godwin, K., Eng, C., Murray, G., Fisher, A.V.: Book design, attention, and reading performance: current practices and opportunities for optimization. In: Proceedings of the 41st Annual Meeting of the Cognitive Science Society, pp. 1851–1857 (2019) 7. Greize, L., Apele, D.: Book design as a reading-boosting factor in society. Daugavpils universit¯ ates 61. Starptautisk¯ as zin¯ atnisk¯ as konferences rakstu kr¯ ajums, pp. 88–97 (2019) 8. Hoffer, A.: Printing practice. Post-Digital Letterpress Printing, pp. 31–32 (2020) 9. Jolly, G.S.: Unit-4 structure of a publishing house. In: Advances in Intelligent Systems and Computing, pp. 37–50 (2017) 10. Kuznetsov, Y.V.: History and philosophy of prepress. In: Principles of Image Printing Technology, pp. 1–21. Springer, Cham (2021). https://doi.org/10.1007/978-3030-60955-9 1 11. Liu, Z.: Research on color design in book design. Frontiers Art Res. 2(9), 37–40 (2020). https://doi.org/10.25236/FAR.2020.020908 12. Mandic, L., Grgic, S., Srdic, I.: Data formats in digital prepress technology. In: International Symposium on VIPromCom Video/Image Processing and Multimedia Communications, pp. 437–440. IEEE (2002)
Models of Factors of the Design Process of REB Editions
229
13. Phalen, T.: Digital book publishing: how disruptive technology serves the niche of self-publishing. Visual Commun. J. 53(2), 3–13 (2017) 14. Said, A.A., Cahyadi, D.: Design of learning media with visual communication design methodology. In: International Conference on Education, Science, Art and Technology, pp. 272–277 (2017) 15. Salimovich, S.S., Fazliddinovna, N.M.: Dictionaries in modern life. Int. J. Integrated Educ. 2(6), 166–168 (2019) 16. Senkivskyy, V., Kozak, R.: Automated design of book editions: monograph, p. 200. Ukrainian Academy of Printing (2008) 17. Senkivskyy, V., Kudriashova, A., Kozak, R.: Information technology of quality formation of editorial and publishing process: monograph, p. 272. Ukrainian Academy of Printing (2019) 18. Sica, A.: Encyclopedias, handbooks, and dictionaries. International Encyclopedia of the Social and Behavioral Sciences: Second Edition, pp. 584–593 (2015) 19. Sichevska, O., Senkivskyy, V., Babichev, S., Khamula, O.: Information technology of forming the quality of art and technical design of books. In: DCSMart, pp. 45–57 (2019) 20. Viluksela, H.: Evaluation of a Prepress Workflow Solution for Sheetfed Offset, p. 47. Metropolia Ammattikorkeakoulu (2017) 21. Wang, X., Yucesoy, B., Varol, O., Eliassi-Rad, T., Barab´ asi, A.L.: Success in books: predicting book sales before publication. EPJ Data Sci. 8(1), 1–20 (2019) 22. Ye, S., Ke, H.: Pricing and quality decisions in the distribution system of physical books and electronic books. FEBM 2019, 258–261 (2019) ˇ 23. Zitinski E.l´ıas, P., Nystr¨ om, D., Gooran, S.: Color separation for improved perceived image quality in terms of graininess and gamut. Color Res. Appl. 42(4), 486–497 (2017)
Analysis of Digital Processing of the Acoustic Emission Diagnostics Informative Parameters Under Deformation Impact Conditions Volodymyr Marasanov1 , Hanna Rudakova1 , Dmitry Stepanchikov1(B) , Oleksandr Sharko2 , Artem Sharko1 , and Tetiana Kiryushatova1 1
Kherson National Technical University, Kherson, Ukraine {dmitro step75,sharko artem}@ukr.net 2 Kherson State Maritime Academy, Kherson, Ukraine [email protected]
Abstract. The problem of diagnostics and identification of the metal structures state during their operation is solved by studying the effects and mechanisms of generation and propagation of acoustic emission signals when changing the parameters of the force field caused by different types of loading. The most informative parameters of acoustic emission signals under the conditions of deformational bending and uniaxial loading have been established. The results of digital processing of signals obtained experimentally are presented. The results obtained are a necessary stage in the mathematical and software processing of information on the restructuring of the internal structure of a material under conditions of plastic deformation and destruction in a dynamic system of assessing the states of the mechanical properties of structures during their operation. The close connection between the processes occurring in the material under loading and the presence of acoustic emission effects makes it possible to predict changes in the mechanical properties and structure of materials based on acoustic measurements. Keywords: Identification · Deformation · Diagnostics · Mechanical properties · Residual life · Complex loads · Acoustic emission · Informative parameters
1
Introduction
The solution to the problem of diagnosing the structural states of materials under loading consists not only in the use of new samples of measuring equipment with improved characteristics but also in the search for new informative parameters and methods of information processing. Methods for obtaining and processing information on the kinetics of processes of changing the structure of materials in conditions of complex deformation effects on equipment elements are increasingly c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 230–251, 2022. https://doi.org/10.1007/978-3-030-82014-5_16
Analysis of Digital Processing of the AE Diagnostics Informative Parameters
231
being used. One of these methods is the method of acoustic emission (AE), which makes it possible to obtain information necessary for diagnosing the technical condition of equipment in real-time. Unlike scanning methods, the AE method does not require surface preparation for testing, detects only developing defects according to their degree of danger, and allows monitoring the state of critical structures without taking them out of service. The use of acoustic emission diagnostics for solving various technological and practical problems is a powerful way to ensure the reliability of the performance of metal products. The close connection between the processes occurring in the material during its destruction and the informative parameters of the occurrence of acoustic emission, serves as the basis for diagnosing changes in the structure of the material when it approaches the state of destruction. Revealing the physical patterns and features of the AE signal formation requires mathematical processing of the amplitude, time, energy, and frequency characteristics of the AE signals. The signals at the output of the diagnostic system are not only discrete in time, but also quantized in level, i.e. are digital. They are calculated from instantaneous values by analog-digital conversion of the values of the electrical signal taken from the sensor and depend on the method of receiving signals during diagnostics. The disadvantage of using instantaneous values of the measured parameters is the large size of the data, which is very important for storing and processing information with limited hardware capabilities. Digital signal processing refers to operations on time-discrete signal samples. Prediction of the structural states of materials of the formation under loading can be carried out on the basis of establishing relationships between the evolution of the defect structure under loading and the mechanisms of the occurrence of AE signals. In the process of plastic deformation, a gradual accumulation of crystal lattice defects is observed, which are the cause of the formation of cracks. Revealing the degree of critical damageability is one of the most important areas of science on the strength of materials. The aim of this work is searching and digital processing of acoustic signals informative parameters accompanying the mechanisms of generation and propagation of AE signals, with the parameters of the force field under different types of loading of materials.
2
Literature Review
The main purpose of AE signal processing is to establish relationships between information features and physical mechanisms of deformation under loading [8, 17,24,27,28,44]. In [14], the results of using the AE method for assessing corrosion damage are presented, in [33] for detecting gas leaks, in [35] for assessing the mechanical properties of two-phase media, in [12,40] for assessing the stress-strain state of
232
V. Marasanov et al.
materials. The characteristics of the AE spectra were used to observe the growth of cracks and changes in the mechanical properties of materials in [16,23,46,47]. In [34], identification and prediction of the turning points of bifurcations using complex networks are presented. In [20], the results of identification of complex interactions between biological objects and their modeling in the form of graphs are presented. A modified algorithm for identifying structural damage at varying temperature and its effect on Young’s modulus is presented in [9]. In [31], the results of assessing the accumulated damage based on the determination of the values of the diagnostic parameter of acoustic emission are presented. One of the main reasons hindering the widespread introduction of AE methods is the nonstationarity of the processes and the noisiness of useful signals that appear at the stages of generation, propagation and indication. To increase the reliability and efficiency of AE methods, various methods of digital signal processing can be applied [3,4,6,10,19,45]. The possibilities of recording AE signal processing when assessing the structural state of materials are reflected in the Fourier transforms, which make it possible to obtain information about the frequency components of the AE signal and its spectrum [32]. Despite the advantages of the Fourier transform, they do not provide an analysis of local frequency changes and features of AE signals. Modern promising methods of filtering AE signals are Short Time Fourier Transform, wavelet transform [11,37,42], Hilbert-Huang transform [5,41,43]. Acoustic emission technology is developing in two directions: detection of defects [13,15] and acoustic emission diagnostics [7,18,36]. When carrying out control by the AE method, their general principles are used, however, it is necessary to take into account the specifics of the application of the method, the peculiarities of accompanying noises, methods of dealing with them and the peculiarities of the processes of changing the structural state of materials.
3
Materials and Methods
The most acceptable form of AE diagnostics under conditions of deformation effects is the organization of processes of continuous recording of information with the subsequent storage of each result in a memory device. Hardware filtering of AE diagnostics signals is provided by using bandpass filters. After filtering, the signal is amplified by the main variable gain amplifier. A key tool for processing and storing data is an analog-to-digital converter, which converts the values of the electrical signal taken from the sensor into a digital code. The obtained instantaneous values of the AE signal are further used to calculate various signal parameters. AE signal processing methods are based on a multilevel system. Initially, the signal preprocessing takes place, which is associated with the elimination of noise that arose in the process of signal registration. Next, the amplitude values are highlighted. Then the time, frequency or fractal characteristics of the signal are determined for which the Fourier transform, window transform,
Analysis of Digital Processing of the AE Diagnostics Informative Parameters
233
wavelet transform, and Hilbert-Huang transform are used. The processing of AE diagnostics signals is completed by cluster and classification analysis, for which k-means methods, the method of principal components, neural networks, and Kohonen maps are used. To observe the process of metal deformation under loading, it is necessary to limit an infinite set of measurements to a finite number, i.e. quantize the continuous signal by level. This is necessary in order to provide protection against interference. Quantizing a continuous AE signal in time is reduced to replacing a large number of values with a specific number of instantaneous values, which is fixed after a certain time interval, which is a quantization step. Digital signal processing is carried out on the basis of numerical methods using digital computing technology.
4
Methodology
The solution of the problems of diagnostics of the strength properties of materials is impossible without a detailed study of their microstructure. This requires mathematical models for the generation of acoustic signals under loading. In [22], a one-dimensional discrete-continuous model of the energy spectrum of AE signals is proposed, in which the force constants are the parameters of elastic bonds. However, the model of a simple structure does not allow separating the lowfrequency and high-frequency parts of the spectrum. To do this, you should move on to the consideration of more complex structures. In [21,25], a model of the appearance of the energy spectrum of AE signals in a complex medium in the form of a diatomic cell is presented. In it, the kinematic variables are not only the longitudinal and transverse displacements of the masses, but also the angle of their rotation. It is shown that the displacement of particles in the middle of the cell determines the high-frequency part of the acoustic signal spectrum, while the displacement of the center of mass determines the low-frequency part. This provides a connection between the translational and rotational degrees of freedom of particles and their oscillatory properties. The presented formalism of transformations caused by stresses in the structure of materials into a continuous analytical function makes it possible to establish a correspondence between the spectral characteristics of the discrete structure of materials with the characteristics of signal propagation. Informational parameters of the AE can be divided into groups: 1. Amplitude parameters – maximum amplitude of the AE signal Umax = max (u0 , ..., un ) , where u0 , . . . , un – digital array of absolute values of the AE signal. – is the average amplitude of the AE signal
(1)
234
V. Marasanov et al. n i=1
|Ui |
, (2) n where n – number of pulses (peaks) in the AE signal, Ui – amplitude of the i-th pulse. – dispersion of AE signal amplitudes 2 n U − U max . (3) DU = i=1 n 2. Time parameters – average values of the intervals between pulses in the AE signal U=
n
MΔT =
i=1
ΔTi
, n where ΔTi – time between two successive peaks in the AE signal. – observation time of the AE signal t=n
n
MΔT .
(4)
(5)
i=1
– rise time of the AE signal (duration of the leading edge) tr = tmax − tsig ,
(6)
where tmax – time corresponding to the maximum amplitude of the AE signal, tsig – time of arrival of the AE signal, taking into account the threshold level. – interval variance n 2 (ΔTi − MΔT ) . (7) DΔT = i=1 n 3. Energy parameters – AE signal energy W =
n
|ui − ubase |,
(8)
i=ic
where ic – number of count corresponding to the arrival time of the AE signal, ubase – displacement of the AE signal relative to the zero level. – AE signal intensity
Analysis of Digital Processing of the AE Diagnostics Informative Parameters
F =
n 1 Nk+1 , K i=i Nk
235
(9)
c
1, at Nk = 0 and Nk+1 > 0; Nk+1 where = Nk – number of pulses in k-th Nk 0, at Nk > 0 and Nk+1 = 0, time interval; Nk+1 – number of pulses in k + 1 time interval; k - interval number, K – total number of intervals. – AE signal density N=
NΣ , tr
(10)
where NΣ – the number of crossings of the AE signal threshold level. 4. Frequency parameters – maximum amplitude of the AE signal in the frequency domain cs max = max (c0 , . . . , cn ) ,
(11)
where c0 , . . . , cn – digital array of absolute values of the AE signal in the frequency domain, obtained using the Fourier transform. – width H0.75 of the main peak of the AE signal at the level of 0.75max is determined by counting the number of exceedances of the 0.75max level with subsequent multiplication by a sampling interval in the frequency domain (0.83 MHz). – frequency localization of the main peak the frequency corresponding to the main peak of the AE signal subjected to the Fourier transform. An effective method for studying the dynamic processes of changes in the structure of materials during the operation of constructions is the method of active experiment with synchronous registration of deformation parameters and the moments of occurrence of AE signals [2,26]. To exclude the subjective factor in determining the discrimination threshold that determines the quality of input information, it is useful to have experience borrowed from related branches of information management under conditions of uncertainty [29,30,38,39,48]. Overestimated sampling thresholds lead to missing useful information. The underestimated ones cause difficulties in identifying the useful signal due to the large amount of low-amplitude noise against the background of useful information. When processing experimental data [1], the computer mathematics system Mathematica 9.0 and algorithms for working with arrays of numerical data were used: finding the maximum (minimum) array elements, sorting array data by attribute, data merging, spline interpolation.
236
5
V. Marasanov et al.
Experiment, Results and Discussion
The results of AE measurements with simultaneous fixation of the load and deformation of the samples are shown in Fig. 1 under tension and in Fig. 2 during tests for four-point bending. Amplitude, time and energy and frequency parameters of AE signals are shown in Figs. 3, 4, 5, 6, 7 and 8 and in Tables 1, 2, 3 and 4.
Fig. 1. Acoustic emission signals at different loads of longitudinal tension, grouped according to the zones of their occurrence in the loading diagram: I – elastic deformation zone, II – plastic deformation zone III – yield zone, pre-fracture
The dashed lines show the boundaries of the work hardening zones, the dots are the experimental values, and the lines are the spline approximation. The change in the AE parameters correlates with the main stages of deformation in the tensile diagram and the degree of material damage. The first maximum of the AE intensity falls on the initial stage of elastic deformation. The second and subsequent maxima are observed at values below the yield stress. At this time, a sharp change in the deformation of the crystal lattice is noted, which characterizes the moment of crack initiation.
Analysis of Digital Processing of the AE Diagnostics Informative Parameters
237
Fig. 2. Acoustic emission signals at different transverse bending loads, grouped by zones of occurrence in the loading diagram: a) elastic deformation zone, b) plastic deformation zone c) yield zone, pre-fracture
The signals perceived by the AE sensor in the elastic region and transitional from elastic to plastic have the form of separate flashes with high-frequency filling and exponentially decaying vibration amplitude (Figs. 1 and 2). In this case, the intensity of the flow of AE acts increases in the transition region and increases like an avalanche with deformations corresponding to the yield area. On the yield area, individual AE pulses overlap strongly, and the recorded signals have the form of almost continuous emission. At the end of the yield area, the AE intensity drops sharply. An increase in the AE intensity is observed again before destruction. On the stress-strain curves, in some cases, a yield drop is observed, which consists in an increase in stress above the yield point. The formation of a yield drop is associated with a sharp increase in the number of mobile dislocations at the beginning of plastic flow. The yield drop appears whenever the initial density of free dislocations is low, but in the process of plastic deformation there is a sharp increase in this density.
238
V. Marasanov et al.
Table 1. Amplitude parameters of AE in tension and four-point bending of specimens from steel St3sp Load P , kg Maximum amplitude Umax , mV
Average amplitude U , mV
Amplitude dispersion DU , mV
I elastic zone
1720 1880 1920 2010
3 20 11 12
0.1 0.3 0.2 0.2
1.81 13.08 10.83 10.55
II plastic deformation zone
2050
22
7.0
18.68
2090 2130
19 9.8
3.5 2.2
12.36 8.01
2160
22
1.4
22.99
2170 2400 2600 3000
21 14 7.6 15
0.4 2.8 0.1 0.3
12.99 10.77 5.67 12.51
I elastic zone
17 20 27
3 4.3 6.6
0.03 0.01 0.01
2.67 3.65 5.09
II plastic deformation zone
38
2.1
0.007
2.04
38.5 39.4 40
2.5 3.0 3.7
0.008 0.029 0.047
2.28 2.49 3.24
47
6.1
0.184
4.30
47.5
10.0
0.473
5.94
Work hardening zones
Tension
III zone of yield, pre-fracture
Bending
III zone of yield, pre-fracture
Under conditions of plastic deformation, only digital processing of AE signals makes it possible to detect and explain the existence of a yield drop. The AE process shown in Fig. 3 reflects the features of plastic deformation. A high level of AE is observed at the yield area, which sharply decreases with the onset of strain hardening up to the destruction of the sample. The yield drop is a jump-like transition from the elastic region to the plastic one.
Analysis of Digital Processing of the AE Diagnostics Informative Parameters
239
Fig. 3. Amplitude parameters of the AE (maximum signal amplitude Umax , average signal amplitude U , amplitude dispersion DU ) under tension (a) and four-point bending (b) of specimens from steel St3sp
Digital signal processing allows us to more accurately determine the signal arrival time, amplitude, rise time and signal duration. In mechanical testing of materials under loading, the yield point is not always manifested and is absent in the loading diagram. The yield point is calculated from the yield drop, i.e. stress at which the elongation reaches 0.2% of the ultimate strength. The yield point characterizes the stress at which a more complete transition to plastic deformation occurs. When the stress is equal to the yield point of the material, a zone of plastic deformation is formed. The volume of this zone is proportional to the voltage level.
240
V. Marasanov et al.
Table 2. Temporal parameters of AE in tension and four-point bending of specimens from steel St3sp Work hardening zones
Load P , kg
Observation Rise time Average time t, µs tr , µs value of intervals MΔT , µs
Dispersion of intervals DΔT , µs
1720 1880 1920 2010
0.75 0.17 0.13 0.11
0.024 0.024 0.036 0.002
0.023 0.014 0.016 0.016
0.030 0.012 0.003 0.003
II plastic 2050 deformation zone 2090 2130
0.18
0.050
0.011
0.004
0.16 0.17
0.014 0.002
0.013 0.017
0.011 0.003
III zone of yield, pre-fracture
2160
0.26
0.138
0.011
0.003
2170 2400 2600 3000
0.17 0.17 0.11 0.23
0.014 0.002 0.002 0.036
0.012 0.017 0.015 0.026
0.007 0.002 0.005 0.025
17 20 27
0.604 0.176 0.166
0.024 0.034 0.048
0.101 0.013 0.013
0.061 0.006 0.006
0.010
0.012
0.005
0.003
0.008 0.152 0.130
0.001 0.118 0.012
0.004 0.022 0.019
0.003 0.019 0.025
47
0.316
0.196
0.015
0.014
47.5
0.420
0.118
0.013
0.007
Tension I elastic zone
Bending I elastic zone
II plastic 38 deformation zone 38.5 39.4 40 III zone of yield, pre-fracture
Analysis of Digital Processing of the AE Diagnostics Informative Parameters
241
Fig. 4. Time parameters of the AE (time of signal observation t, rise time tr , average value of intervals MΔT , dispersion of intervals DΔT ) under tension (a) and four-point bending (b) of specimens from steel St3sp. The dotted lines show the boundaries of the work hardening zones, the dots are the experimental values, and the lines are the spline approximation
Digital signal processing makes it possible to explain the physical and mechanical nature and statistical regularities of the AE phenomenon, which ensures the creation on its basis of new methods for diagnosing a pre-destructive state under various external influences. This ensures the restoration of the true parameters of the process of structural transformations and the accumulation of damage by the recorded parameters of the AE signals.
242
V. Marasanov et al.
Table 3. Energy parameters of AE in tension and four-point bending of specimens from steel St3sp Load P , kg Signal energy W, 10−3 arb.un.
Signal strength F , arb.un.
Signal density N, 107 s−1
I elastic zone
1720 1880 1920 2010
0.302 4.955 1.616 2.855
33 12 8 7
4.69 4.02 1.93 1.59
II plastic deformation zone
2050
16.000
17
5.53
2090 2130
4.354 2.360
13 10
4.10 3.27
2160
34.000
24
6.53
2170 2400 2600 3000
5.464 4.307 1.804 2.403
15 10 8 9
4.44 2.93 2.51 2.17
I elastic zone
17 20 27
0.302 0.512 0.859
6 13 13
13.91 10.25 13.72
II plastic deformation zone
38
0.083
2
19.20
38.5 39.4 40
0.158 0.376 0.256
2 7 7
18.83 18.67 15.44
47
1.270
21
16.60
47.5
2.323
33
17.32
Work hardening zones
Tension
III zone of yield, pre-fracture
Bending
III zone of yield, pre-fracture
The maximum AE activity in the yield drop zone and the yield area is explained by the mass formation and movement of dislocations. Then the activity decreases, due to the fact that the movement of the newly formed dislocations is limited by the already existing ones.
Analysis of Digital Processing of the AE Diagnostics Informative Parameters
243
Fig. 5. Energy parameters of the AE (signal energy W , signal intensity F , signal density N ) under tension (a) and four-point bending (b) of specimens from steel St3sp. The dotted lines show the boundaries of the work hardening zones, the dots are the experimental values, and the lines are the spline approximation
Discussion of the results of digital processing of informative AE parameters accompanying the mechanisms of generation and propagation of AE signals with mechanisms for the development of ideas about the nature of strength and fracture of materials with force field parameters in the loading diagram requires the involvement of a complex of modern physical research in the field of acoustics, materials science, and fracture mechanics.
244
V. Marasanov et al.
Fig. 6. Fourier transforms of AE signals at different loads for longitudinal tensile deformation of specimens made of St3sp steel, grouped by work hardening zones: zone I elastic deformation zone, zone II - plastic deformation zone, zone III - yield and prefracture zone
A possible mechanism for the appearance of AE signals during stretching and bending is as follows. For the AE effect to occur, energy must be released. The breakthrough of dislocation clusters releases the elastic energy of their interaction and causes the emission of elastic waves. The sources of AE in the deformation of materials are the processes of annihilation of dislocations when they emerge on the surface of the material. With the simultaneous movement of a dislocation caused by loading, stress waves creeping over each other create a continuous AE.
Analysis of Digital Processing of the AE Diagnostics Informative Parameters
245
Fig. 7. Fourier transforms of AE signals at different loads for deformation of four-point bending of specimens made of St3sp steel, grouped by work hardening zones: zone I - elastic deformation zone, zone II - plastic deformation zone, zone III - yield and pre-fracture zone
With a change in the conditions of deformation and the transition from the zone of elastic deformation to the zone of plastic flow, the type of stress concentrators changes. There are no sharp boundaries between the zones of the loading diagram when they are determined from the results of metallographic studies. The mutual scale of the zones is also different, therefore, in mechanical tests, the material state parameter is considered to be changing in time and time-limited realizations are used, in which the process can be considered quasistationary. In contrast to this, during acoustic measurements with a transition from one zone of work hardening to another, a clear change in the character of the AE is observed.
246
V. Marasanov et al.
Fig. 8. Frequency parameters of the AE (maximum amplitude of the AE signal in the frequency domain Csmax , width H0.75 of the main peak of the AE signal at a level of 0.75max, frequency localization of the main peak f ) under tension (a) and four-point bending (b) of specimens from steel St3sp. The dotted lines show the boundaries of the work hardening zones, the dots are the experimental values, the lines are the spline approximation
It was found that by measuring the parameters of an acoustic wave, it is possible not only to determine the coordinates and place of radiation, but also to establish the relationship between the radiation parameters and the phenomena that occur during plastic deformation of the material.
Analysis of Digital Processing of the AE Diagnostics Informative Parameters
247
Table 4. Frequency parameters of AE in tension and four-point bending of specimens from steel St3sp Load P , kg Maximum amplitude Csmax
Main peak width H0.75 , MHz
Main peak frequency f , MHz
I elastic zone
1720 1880 1920 2010
0.023 0.019 0.008 0.011
5.0 5.0 6.7 6.7
79.2 79.1 48.3 48.3
II plastic deformation zone
2050
0.036
3.3
80.0
2090 2130
0.017 0.014
5.0 1.7
80.0 48.3
2160
0.057
3.3
80.0
2170 2400 2600 3000
0.019 0.017 0.009 0.011
5.0 5.0 7.5 5.8
80.0 48.3 48.3 48.3
I elastic zone
17 20 27
0.006 0.008 0.009
5.0 3.3 5.0
90.8 85.0 85.0
II plastic deformation zone
38
0.002
10.0
93.3
38.5 39.4 40
0.003 0.005 0.004
6.7 8.3 5.0
85.0 86.0 85.0
47
0.011
3.3
94.0
47.5
0.013
4.2
94.0
Work hardening zones
Tension
III zone of yield, pre-fracture
Bending
III zone of yield, pre-fracture
6
Conclusions
1. Acoustic emission is a structure-sensitive parameter for fixing the initial changes in the properties of materials, while mechanical tests record their stable stage. 2. The difference between the pulse shapes recorded in the main zones of the loading diagram has been established. For the stage of elastic deformation, a discrete flow of AE signals of small amplitude and duration is observed. Further loading leads to an increase in the deformed volume and the release
248
V. Marasanov et al.
of a large amount of acoustic energy and an increase in pulse amplitudes. It becomes possible to separate the signals formed at separate stages of metal deformation. 3. At the stage of elastic deformation, a linear and relatively slow accumulation of cumulative AE parameters is observed. At the stage of plastic deformation, the rate of energy accumulation and the amplitude increase sharply. The fixation of the precursors of the appearance of structural changes using AE measurements occurs earlier than it appears from mechanical measurements. 4. The close connection between the processes occurring in the material under loading and the presence of AE effects makes it possible to predict changes in the mechanical properties and structure of materials based on acoustic measurements.
References 1. Aleksenko, V., Sharko, A., Sharko, O., Stepanchikov, D., Yurenin, K.: Identification by ae method of structural features of deformation mechanisms at bending. Tech. Diagn. Nondestr. Test. (1), 32–39 (2019). https://doi.org/10.15407/tdnk2019.01. 01 2. Aleksenko, V., Sharko, A., Smetankin, S., Stepanchikov, D., Yurenin, K.: Detection of acoustic-emission effects during reloading of St3sp steel specimens. Tech. Diagn. Nondestr. Test. (4), 25–31 (2017). https://doi.org/10.15407/tdnk2017.04.04 3. Babichev, S., Durnyak, B., Zhydetskyy, V., Pikh, I., Senkivskyy, V.: Application of optics density-based clustering algorithm using inductive methods of complex system analysis. In: IEEE 2019 14th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2019 - Proceedings, pp. 169–172 (2019). https://doi.org/10.1109/STC-CSIT.2019.8929869 ˇ 4. Babichev, S., Skvor, J.: Technique of gene expression profiles extraction based on the complex use of clustering and classification methods. Diagnostics 10(8) (2020). Article no. 584. https://doi.org/10.3390/diagnostics10080584 5. Babichev, S., Sharko, O., Sharko, A., Milhalyov, O.: Soft filtering of acoustic emission signals based on the complex use of Huang transform and wavelet analysis. Adv. Intell. Syst. Comput. 1020, 3–19 (2020). https://doi.org/10.1007/978-3-03026474-1 1 6. Bobrov, A., Danilina, A.: Probabilistic method for choosing significant signal filtering parameters in acoustic emission diagnostics of technical objects. Russ. J. Nondestr. Test. 12, 36–43 (2014) 7. Bohmann, T., Schlamp, M., Ehrlich, I.: Acoustic emission of material damages in glass fibre-reinforced plastics. Compos. B Eng. 155, 444–451 (2018). https://doi. org/10.1016/j.compositesb.2018.09.018 8. Cho, H., Shoji, N., Ito, H.: Acoustic emission generation behavior in A7075-T651 and A6061-T6 aluminum alloys with and without cathodic hydrogen charging under cyclic loading. J. Nondestr. Eval. 37(4), 1–7 (2018). https://doi.org/10.1007/ s10921-018-0536-7 9. Ding, Z., Fu, K., Deng, W., Li, J., Zhongrong, L.: A modified artificial bee colony algorithm for structural damage identification under varying temperature based on a novel objective function. Appl. Math. Model. 88, 121–141 (2020). https:// doi.org/10.1155/2014/432654
Analysis of Digital Processing of the AE Diagnostics Informative Parameters
249
10. Dmitriev, A., Polyakov, V., Kolubaev, E.: Digital processing of acoustic emission signals in the study of welded compounds in metal alloys. High-Perform. Comput. Syst. Technol. 4(1), 32–40 (2020) 11. Dmitriev, A., Polyakov, V., Lependin, A.: Investigation of plastic deformation of aluminum alloys using wavelet transforms of acoustic emission signals. Russ. J. Nondestr. Test. 8(1), 33–36 (2018). https://doi.org/10.22226/2410-3535-2018-133-36 12. Fomichev, P., Zarutskiy, A., Lyovin, A.: Researches of the stressed-deformed state of the power structures of the plane. Syst. Decis. Control Energy 1, 37–49 (2020). https://doi.org/10.1007/978-3-030-48583-2 3 13. Gagar, D., Foote, P., Irving, P.: Effects of loading and sample geometry on acoustic emission generation during fatigue crack growth: implications for structural health monitoring. Int. J. Fatigue 81, 117–127 (2015). https://doi.org/10.1016/j.ijfatigue. 2015.07.024 14. Gong, K., Hu, J.: Online detection and evaluation of tank bottom corrosion based on acoustic emission. In: Qu, Z., Lin, J. (eds.) Proceedings of the International Field Exploration and Development Conference 2017. Springer Series in Geomechanics and Geoengineering, vol. 216039, pp. 1284–1291. Springer, Singapore (2019). https://doi.org/10.1007/978-981-10-7560-5 118 15. Kanakambaran, S., Sarathi, R., Srinivasan, B.: Robust classification of partial discharges in transformer insulation based on acoustic emissions detected using fiber Bragg gratings. IEEE Sens. J. 18(24), 10018–10027 (2018). https://doi.org/10. 1109/JSEN.2018.2872826 16. Lependin, A.A., Polyakov, V.V.: Scaling of the acoustic emission characteristics during plastic deformation and fracture. Tech. Phys. 59(7), 1041–1045 (2014). https://doi.org/10.1134/S1063784214070184 17. Li, B., et al.: Prediction equation for maximum stress of concrete drainage pipelines subjected to various damages and complex service conditions. Constr. Build. Mater. 264(20), 120238 (2020). https://doi.org/10.1016/j.conbuildmat. 2020.120238 18. Louda, P., Sharko, A., Stepanchikov, D.: An acoustic emission method for assessing the degree of degradation of mechanical properties and residual life of metal structures under complex dynamic deformation stresses. Materials 14(9), 2090 (2021). https://doi.org/10.3390/ma14092090 19. Maiorov, A.: Digital technologies in the non-destructive inspection. Oil Gas 1, 26–37 (2010) 20. Maji, G., Mandal, S., Sen, S.: A systematic survey on influential spreaders identification in complex networks with a focus on k-shell based techniques. Expert Syst. Appl. 161, 113681 (2020). https://doi.org/10.1016/j.eswa.2020.113681 21. Marasanov, V., Sharko, A.: Energy spectrum of acoustic emission signals in complex media. J. Nano- Electron. Phys. 9(4), 04024-1–04024-5 (2017). https://doi. org/10.21272/jnep.9(4).04024 22. Marasanov, V., Sharko, A.: The energy spectrum of the acoustic emission signals of nanoscale objects. J. Nano-Electron. Phys. 9(2), 02012-1–02012-4 (2017). https:// doi.org/10.21272/jnep.9(2).02012 23. Marasanov, V., Sharko, A.: Determination of the power constants of the acoustic emission signals in the equations of the model of the complex structure motion of a continuous medium. J. Nano-Electron. Phys. 10(1), 01019(1)–01019(6) (2018). https://doi.org/10.21272/jnep.10(1).01019
250
V. Marasanov et al.
24. Marasanov, V., Sharko, A., Stepanchikov, D.: Model of the operator dynamic process of acoustic emission occurrence while of materials deforming. In: Lytvynenko, V., Babichev, S., W´ ojcik, W., Vynokurova, O., Vyshemyrskaya, S., Radetskaya, S. (eds.) ISDMCI 2019. AISC, vol. 1020, pp. 48–64. Springer, Cham (2020). https:// doi.org/10.1007/978-3-030-26474-1 4 25. Marasanov, V., Sharko, O., Sharko, A.: Boundary problems of determining the energy spectrum of signals of acoustic emission in conjugated continuous media. Cybern. Syst. Anal. 55(5), 170–179 (2019) 26. Marasanov, V., Stepanchikov, D., Sharko, O., Sharko, A.: Technique of system operator determination based on acoustic emission method. Adv. Intell. Syst. Comput. 1246, 3–22 (2021). https://doi.org/10.1007/978-3-030-54215-3 1 27. Muravev, V. and Tapkov, K.: Evaluation of strain-stress state of the rails in the production. Devices Methods Meas. 8(3), 263–270 (2017). https://doi.org/10.21122/ 2220-9506-2017-8-3-263-270 28. Nedoseka, A., Nedoseka, S., Markashova, L., Kushnareva, O.: On identification of structural changes in materials at fracture by acoustic emission data. Tech. Diagn. Nondestr. Test. (4), 9–13 (2016). https://doi.org/10.15407/tdnk2016.04.02 29. Nosov, P., Ben, A., Zinchenko, S., Popovych, I., Mateichuk, V., Nosova, H.: Formal approaches to identify cadet fatigue factors by means of marine navigation simulators. In: 16th International Conference on ICT in Research, Education and Industrial Applications, vol. 2732, pp. 823–838 (2020) 30. Nosov, P., Zinchenko, S., Popovich, I., Safonov, M., Palamarchuk, I., Blah, V.: Decision support during the vessel control at the time of negative manifestation of human factor. In: Computer Modeling and Intelligent Systems: Proceedings of the Third International Workshop on Computer Modeling and Intelligent Systems, vol. 2608, pp. 12–26 (2020) 31. Nosov, V.V., Zelenskii, N.A.: Estimating the strength of welded hull elements of a submersible based on the micromechanical model of temporal dependences of acoustic-emission parameters. Russ. J. Nondestr. Test. 53(2), 89–95 (2017). https://doi.org/10.1134/S1061830917020036 32. Ovcharuk, V., Purasev, Y.: Registration and Processing of Acoustic Emission Information in Multichannel Systems. Pacific State University, Khabarovsk (2017) 33. Pasternak, M., Jasek, K., Grabka, M.: Surface acoustic waves application for gas leakage detection. Diagnostyka 21(1), 35–39 (2020). https://doi.org/10.29354/ diag/116078 34. Peng, X., Zhao, Y., Small, M.: Identification and prediction of bifurcation tipping points using complex networks based on quasi-isometric mapping. Physica A: Stat. Mech. Appl. (560), 125108 (2014). https://doi.org/10.1016/j.physa.2020.125108 35. Rajabi, A., Omidi Moaf, F., Abdelgader, H.: Evaluation of mechanical properties of two-stage concrete and conventional concrete using nondestructive tests. J. Mater. Civ. Eng. 32(7), 04020185 (2020). https://doi.org/10.1061/(ASCE)MT.1943-5533. 0003247 36. Rescalvo, F., Suarez, E., Valverde-Palacios, I., Santiago-Zaragoza, J., Gallego, A.: Health monitoring of timber beams retrofitted with carbon fiber composites via the acoustic emission technique. Compos. Struct. 206(15), 392–402 (2018). https:// doi.org/10.1016/j.compstruct.2018.08.068 37. Riabova, S.: Application of wavelet analysis to the analysis of geomagnetic field variations. J. Phys. Conf. Ser. 1141(1), 012146 (2018). https://doi.org/10.1088/ 1742-6596/1141/1/012146
Analysis of Digital Processing of the AE Diagnostics Informative Parameters
251
38. Sharko, M., Shpak, N., Gonchar, O., Vorobyova, K., Lepokhina, O., Burenko, J.: Methodological basis of causal forecasting of the economic systems development management processes under the uncertainty. Adv. Intell. Syst. Comput. 1246, 423–436 (2021). https://doi.org/10.1007/978-3-030-54215-3 27 39. Sharko, M., Zaitseva, O., Gusarina, N.: Providing of innovative activity and economic development of enterprise in the conditions of external environment dynamic changes. Sci. Bull. Polissia 3(11(2)), 57–60 (2017) 40. Su, F., Li, T., Pan, X., Miao, M.: Acoustic emission responses of three typical metals during plastic and creep deformations. Exp. Tech. 42(6), 685–691 (2018). https://doi.org/10.1007/s40799-018-0274-x 41. Susanto, A., Liu, C., Yamada, K., Hwang, Y., Tanaka, R., Sekiya, K.: Milling process monitoring based on vibration analysis using Hilbert-Huang transform. Int. J. Autom. Technol. 12(5), 688–698 (2018). https://doi.org/10.20965/ijat.2018. p0688 42. Sychev, S., Fadin, Y., Breki, A., Gvozdev, A., Ageev, E., Provotorov, D.: Timefrequency analysis of acoustic emission signals recorded during friction using wavelet transform. Russ. J. Nondestr. Test. 7(4(25)), 49–59 (2017) 43. Trusiak, M., Styk, A., Patorski, K.: Hilbert-Huang transform based advanced Bessel fringe generation and demodulation for full-field vibration studies of specular reflection micro-objects. Opt. Lasers Eng. 110, 100–112 (2018). https://doi. org/10.1016/j.optlaseng.2018.05.021 44. Wang, K., Zhang, X., Hao, Q., Wang, Y., Shen, Y.: Application of improved leastsquare generative adversarial networks for rail crack detection by AE technique. Neurocomputing 332, 236–248 (2019). https://doi.org/10.1016/j.neucom.2018.12. 057 45. Yakovlev, A., Sosnin, V.: Digital processing of acoustic pulses in the acoustic emission diagnostics system Kaeme. Electron. J. Tech. Acoust. 4, 1–14 (2018) 46. Yuan, H., Liu, X., Liu, Y., Bian, H., Chen, W., Wang, Y.: Analysis of acoustic wave frequency spectrum characters of rock under blasting damage based on the HHT method. Adv. Civil Eng. 2018(9207476), 8 (2018). https://doi.org/10.1155/ 2018/9207476 47. Zhang, X., et al.: A new rail crack detection method using LSTM network for actual application based on AE technology. Appl. Acoust. 142, 78–86 (2018). https://doi. org/10.1016/j.apacoust.2018.08.020 48. Zinchenko, S., Tovstokoryi, O., Nosov, P., Popovych, I., Kobets, V., Abramov, G.: Mathematical support of the vessel information and risk control systems. In: Proceedings of the 1st International Workshop on Computational and Information Technologies for Risk-Informed Systems, vol. 2805, pp. 335–354 (2020)
Solution of the Problem of Optimizing Route with Using the Risk Criterion Pavlo Mamenko1 , Serhii Zinchenko1(B) , Vitaliy Kobets2 , Pavlo Nosov1 , and Ihor Popovych2 1
Kherson State Maritime Academy, Kherson, Ukraine {srz56,pason}@ukr.net 2 Kherson State University, Kherson, Ukraine [email protected]
Abstract. The aim of the work is to determine the conditions of optimality in the task of plotting the course of the vessel and the operation of divergence of vessels in conditions of intensive navigation. The need for such work is dictated, firstly, by an increase in the intensity of shipping and, secondly, by the emergence of autonomous ships and transport systems, the traffic control algorithms of which obviously require an optimal approach. The criterion of optimality in problems of this class is the expected risk, one of the components of which is the risk of collision of ships. Based on the analysis of methods for constructing ship divergence algorithms, the task is to find a control algorithm that delivers the best results for all participants in the operation. This formulation of the task greatly facilitates the forecast of the actions of all participants in the discrepancy and is especially expedient in the case of participation in the operation of an autonomous system or a ship with which no contact has been established. Theoretically, the task belongs to the most difficult class of control problems - optimal control of a distributed dynamic system with a vector - a goal functional [3, 5, 8, 13–15]. The ability to obtain a general solution to the task of optimal ship control makes this study expedient. Keywords: Vector-functional · Optimal control · Risk criterion separation · Mathematical model · Avoid collision
1
· Safe
Introduction
Research into the problem of effective methods for preventing collisions of ships has become paramount and important in connection with the increase in tonnage, overall dimensions, speed and number of ships involved in the carriage of goods by sea. Particularly important is the factor of the emergence of autonomous ships and systems, the actions of which have a clear algorithm and a specific goal. Thus, it becomes possible to reduce the uncertainty in the task of forecasting the actions of ships, which expands the possible range of c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 252–265, 2022. https://doi.org/10.1007/978-3-030-82014-5_17
Solution of the Problem of Optimizing Route
253
actions of own ship. Taking into account the achievements in improving the safety of navigation, the use of radars, and then the development of the ARPA (Automatic Radar Plotting Aids) collision avoidance system, which allows you to automatically track at least 20 encountered objects, determine the parameters n = DCP An of their movement (speed Vn , course Ψn ), approach elements ((Dmin - distance to the closest point of approach (Distance of the Closest Point of n = T CP An - time to the closest point of approach, (Time to Approach), Tmin the Closest Point of Approach)), as well as an assessment of the collision risk rn , the task is to determine the general optimality criteria and methods for solving the task. The works [18,19,21–24] considered methods of increasing the accuracy and reliability of vessel control in automated and automatic control systems, including avoid collision. A more efficient method is to determine a safe trajectory of the vessel, taking into account the trajectories of all vessels involved in the operation. However, there is considerable uncertainty associated with the actions of the courts in the divergence process. Uncertainty reduction can be achieved if the actions of own ship are consistent with those of other ships. This task is simple but requires the use of optimal control methods under the condition of optimization of the vector - functional. This is already a difficult task, the methods of solving which are poorly studied, but its solution, in this case, allows you to build optimal algorithms for plotting a course when the ships diverge. The purpose of this article is to substantiate, develop and simulate an algorithm for safe separation of ships using the criterion of expected risk.
2
Problem Statements
The task of optimal plotting of the course first of all requires the determination of the criterion of optimality or the function of the goal. It becomes necessary to plot the trajectory of the vessel S(x) in such a way as to avoid possible collisions, loss of cargo and other complications. This need is formulated as minimization of the risk C on the trajectory of the vessel. Obstacles to navigation are expressed by constraints such as the inequalities ϕi (x) = 0, i = 1..m and inequalities ϕi (x) < 0, i = m..n, that is, we obtain the Lagrange task [2,9,20] x∗ → minC(S(x)), ϕi (x) = 0, 1..m,
(1)
ϕi (x) < 0, i = m..n.
3
Materials and Methods
The well-known technique for solving this problem involves the formation of the Lagrange function L(x, λ) , the gradient of which on x∗ vanishes ∂L = 0, dx
254
P. Mamenko et al.
L(x, λ) = λ0 S(x) − λ1 ϕ1 (x) − λ2 ϕ2 (x),
gradL = 0, →
∂L = 0, ∂λ1
(2)
λ2 ϕ2 (x) = 0. Condition (2), known as the Kuhnna-Tucker theorem [6], defines the optimum point as a point stationary in the coordinate when constraints such as equality are satisfied and the goal function is insensitive to constraints such as inequality. In this simple but important problem, let us trace the meaning of the Lagrange multipliers λ ∂φ ∂S ∂S =λ →λ . (3) gradL(x, λ) = 0 → ∂x ∂x ∂φ Thus, expression (3) illustrates an important fact - the Lagrange multiplier is the sensitivity of the goal function to constraints. Condition (3) plays an important role in the problem of constructing an optimal route. The considered divergence problem differs from the standard one in that it seeks the optimal solution to the divergence problem for all ships. Thus, for n participants in the operation, there are n goal functions and a route laid in such a way that all goal functions reach their optimum ⎫ ⎡ ⎤ S1 (x) x∗ → minS1 (x) ⎪ ⎪ ⎪ ⎢ S2 (x) ⎥ x∗ → minS2 (x) ⎬ ⎢ ⎥ → s(x) = (4) ⎢ .. ⎥ . .. ⎪ ⎣ . . ⎦ ⎪ ⎪ ⎭ x∗ → minSn (x) Sn (x) For the simplest optimization problem, the vector of function s is set to the problem x∗ → min(s), then ⎫ ⎫ dS1 (x) ⎪ = 0 x∗ → minS1 (x) ⎪ ⎪ dx ⎪ ⎪ ⎪ ⎪ dS2 (x) ⎬ x∗ → minS2 (x) ⎬ = 0 dx → . (5) .. . .. ⎪ ⎪ . ⎪ ⎪ ⎪ ⎪ ⎭ ⎪ ⎭ dSn (x) x∗ → minSn (x) =0 dx Condition (5) is satisfied only if the extremum point for all components of the goal function vector coincides; therefore, in the neighborhood of the point x∗ , the components of the goal function vector are indistinguishable and the problem is degenerate. As a consequence, there are restrictions on the values of the components of the target vector at the extremum point ⎫ ⎫ S1 (x) − a1 = 0 ⎪ x∗ → minS1 (x) ⎪ ⎪ ⎪ ⎪ ⎪ x∗ → minS2 (x) ⎬ S2 (x) − a2 = 0 ⎬ ; . (6) .. .. ⎪ ⎪ . . ⎪ ⎪ ⎪ ⎪ ⎭ ⎭ x∗ → minSn (x) Sn (x) − an = 0 Then we obtain the optimality condition (3) in the form ⎛ ⎞ λ11 . . . λ1n ds d(s − a) =⎝ · · · ⎠ . dx dx λn1 . . . λnn
(7)
Solution of the Problem of Optimizing Route
255
Solution of the problem - the matrix of Lagrange multipliers must be unit. Indeed, Eq. (7) has a solution only for the unit matrix Λ ⎛ ⎞ 1 ... 0 ds ds ds d(s − a) = → = ⎝· · ·⎠ . (8) a = const → dx dx dx dx 0 ... 1 Taking into account the meaning of Lagrange multipliers (3), we can write down the optimality condition in problem (4) ∂Si = 0; ∂Sj
i = 1, n;
j = 1, n;
i = j;
i = j →
∂Si = 1. ∂Si
(9)
From condition (9) it follows that the optimal solution should not worsen any of the solutions, that is, the components of the goal vector are independent and their states do not affect each other. This result is known as the Pareto criterion or as the effective Jeffrion solution [7]. If the components of the target vector depend on several variables, the problem becomes more cumbersome, since in this case the derivative of the component of the target vector with respect to the state vector turns into a matrix and, as a consequence, the matrix of Lagrange multipliers becomes cellular. Thus, in the problem with the dimension of the target vector equal to the dimension of the state space, we have ⎫ ⎫ S1 (x) − a1 = 0 ⎪ x∗ → minS1 (x) ⎪ ⎪ ⎪ ⎪ ⎪ x∗ → minS2 (x) ⎬ S2 (x) − a2 = 0 ⎬ ; . (10) .. .. ⎪ ⎪ . . ⎪ ⎪ ⎪ ⎪ ⎭ ⎭ x∗ → minSn (x) Sn (x) − an = 0 Optimality condition (3) in the form (10) takes a more complex form with the cellular matrix of factors ⎛⎛ 11 ⎞ ⎛ 1n ⎞⎞ λ11 · λ11 λ11 · λ1n 1n 1n ⎜⎝ · · · ⎠ . . . ⎝ · · · ⎠ ⎟ ⎜ 11 ⎟ ⎜ λn1 · λ11 ⎟ · λ1n λ1n nn nn n1 ⎜ ⎟ ds ⎟. · · · (11) =⎜ ⎜ ⎟ ⎛ ⎞ ⎛ ⎞ dx ⎜ n1 n1 nn nn ⎟ λ · λ · λ λ 1n 11 1n ⎟ ⎜ 11 ⎝⎝ · · · ⎠ . . . ⎝ · · · ⎠⎠ n1 nn λn1 λnn n1 · λnn n1 · λnn Thus, the Eq. (7) has a solution only for the unit matrix Λ ⎞ ⎛ ⎞⎞ ⎛⎛ 1·0 1·0 ⎜⎝ · · · ⎠ . . . ⎝ · · · ⎠⎟ ⎟ ⎜ ⎜ 0·1 0·1 ⎟ ⎟ d(s − a) ds ds ⎜ ⎜ · ⎞ · ⎛ · ⎞⎟ a = const → = → ⎟. ⎜ ⎛ dx dx dx ⎜ ⎟ 1 · 0 1 · 0 ⎟ ⎜ ⎝⎝ · · · ⎠ . . . ⎝ · · · ⎠⎠ 0·1 0·1
(12)
256
P. Mamenko et al.
With the inequality of the dimensions of the target vector and the state vector, we are dealing with non-square matrices, but at the same time the optimality requirements remain - mutual insensitivity at the optimum point between the components of the target vector. The problem of constructing an optimal trajectory implies optimality along the entire trajectory, therefore, the component of the goal vector is not a function, but a functional, the integrand of which, the risk vector C, depends on the state vector x and the control vector u ⎤ ⎡ t1 ⎢ C1 (x, u1 )dt ⎥ ⎥ ⎢ t0 ⎥ ⎢ t1 ⎢ C (x, u )dt ⎥ ⎥ ⎢ 2 2 ⎥. (13) J(x,u) = c ⎢ ⎥ ⎢ t0 .. ⎥ ⎢ ⎥ ⎢ . ⎥ ⎢ t1 ⎦ ⎣ Cm (x, um1 )dt t0
Taking into account the specifics of plotting the course, we assume that the target functional of own ship is the first component of the vector of the target functional, and for the remaining components, we accept the hypothesis of their trajectories and controls. An important feature of the problem is that all components of the target functional vector are convex, have extrema, and can be optimized. On the other hand, all ships have the same type of linear ship dynamics model, the only difference is in the object matrices Ai and control matrices Bi. In general, these constraints have the form dx/dt − ϕ(x, u) = 0. Thus, we have the task of controlling a ship with a vector target function, Fig. 1. Since we have a very complex problem, we first consider the solution of a one-dimensional optimal control problem with a convex functional and a fully controllable dynamical system under constraints. For a convex goal functional F (x, x, ˙ t) and constraints in the form of a dynamical system described in the Cauchy form, we have the task ∗
∗ ∗
t1
(x , u ,t ) → extr
F (x, u,t)dt;
x˙ − f (x, u,t) = 0;
t0
x(t0 ) = x . x(t1 ) = x1
(14)
The resulting problem is a Lagrange problem with equality-type constraints, and we seek its solution using the Lagrange functional t1 t1 T T ˜ = [f0 λ0 + λ f − λ x]dt L ˙ = L(x, x, ˙ u, t)dt t0
(15)
t0
Since the integrand of the goal function is convex and the constraints are controllable, the Kuhn - Tucker conditions are satisfied for the integrand of the Lagrange functional
Solution of the Problem of Optimizing Route
257
Fig. 1. Control structure with a vector goal
˙ u, λ) ≤ L(x, x, ˙ u, λ) ≤ L(x, x, ˙ u, λ∗ ) L(x∗ , x,
(16)
We separate the Hamilton function in the Lagrange function L(x, x, ˙ u, λ) = H(x, u, λ) − λT∗ x˙
(17)
We express the controls in terms of the Lagrange multiplier u = u(λ) and, considering the control as an independent variable, we obtain H(x∗ , u) − λ∗T x˙ ≤ H(x∗ , u∗ ) − λ∗T x˙ ≤ H(x, u∗ ) − λ∗T x˙
(18)
Eliminating similar ones, we obtain the Kuhnna - Tucker condition for the Hamilton function in the problem with a convex goal functional and controlled constraints (19) H(x∗ , u)|λ∗ ≤ H(x∗ , u∗ )|λ∗ ≤ H(x, u∗ )|λ∗ This inequality breaks down into two conditions x∗|λ∗ → minH(x, u∗ ) . u∗|λ∗ → minH(x∗ , u)
(20)
The first inequality of the system gives rise to Bellman’s principle, and the second inequality gives rise to the Pontryagin maximum principle.
258
P. Mamenko et al.
In the problem under consideration, we take into account the constraints associated with the dynamics of the ship ⎤ ⎡ t1 ⎢ C1 (x, u1 )dt ⎥ ⎥ ⎢ t0 ⎥ ⎢ t1 ⎢ C (x, u )dt ⎥ ⎥ ⎢ 2 2 ⎥ = C(x, u)dt; (21) J(x, u) = ⎢ ⎥ ⎢ t0 .. ⎥ ⎢ t ⎥ ⎢ . ⎥ ⎢t ⎦ ⎣ 1 Cm (x, um1 )dt t0
⎡
⎤ x˙ − A1 x − B1 u1 ⎢ x˙ − A2 x − B2 u2 ⎥ ⎢ ⎥ ϕ(X, u) = ⎢ ⎥ = 0; .. ⎣ ⎦ . x˙ − An x − Bn un Then, after supplementing the constraints with the optimality condition (5), the Hamilton vector function in this problem has the form, we emphasize in the notation vectors − − → →→ − − →→ − − → → → → H(→ x,− u ) = λ0 C(− x,→ u ) − λ(− ϕ (− x,− u ) − x) ˙ − λ C(− x,→ u ).
(22)
Therefore, if λ0 can be considered equal to one, then Λ is a cellular matrix similar to the matrix in equation (11). Let us assume that the problem is stationary, which makes it possible to use the maximum principle. Consequently, the optimality conditions take the form ⎧ ∂H ⎨ ∂x = + dλ dt ∗ (23) u|x∗∗ → maxH(x∗ , u). λ ⎩ ∂H = 0 ∂u Since we are looking for the strong optimum of the stationary problem according to the maximum principle, the Hamilton function is constant on the optimal trajectory. Then, if the Lagrange multiplier matrix is unit Λ = I, the optimality condition (23) is satisfied. Thus, the general divergence problem, provided that all components of the vector of integrands of the goal functional are convex, has a simple solution ⎞ ⎛ ⎞⎞ ⎞ ⎛ 1n ⎞ ⎞ ⎛⎛ ⎛⎛ 11 1·0 1·0 λ11 · λ11 λ11 · λ1n 1n 1n ⎜⎝ · · · ⎠ . . . ⎝ · · · ⎠ ⎟ ⎜⎝ · · · ⎠ . . . ⎝ · · · ⎠⎟ ⎟ ⎟ ⎜ ⎜ 11 1n ⎟ ⎜ 0·1 ⎜ λn1 · λ11 0·1 ⎟ λ1n nn n1 · λnn ⎟ ⎟ ⎜ ⎜ ⎟ ⎟ ⎜ λ=⎜ ⎜⎛ n1 · n1 ⎞ · ⎛ nn · nn ⎞⎟ = ⎜⎛ · ⎞ · ⎛ · ⎞⎟ = I. (24) ⎟ ⎜ ⎟ ⎜ λ11 · λ1n 1 · 0 1 · 0 · λ λ 11 1n ⎟ ⎟ ⎜ ⎜ ⎝⎝ · · · ⎠ . . . ⎝ · · · ⎠⎠ ⎝⎝ · · · ⎠ . . . ⎝ · · · ⎠⎠ n1 nn 0·1 0·1 λn1 λnn n1 · λnn n1 · λnn
Solution of the Problem of Optimizing Route
259
A simpler formulation of optimality consists in the absence of mutual sensitivity with respect to objective functions ∂Ci = 0; ∂Cj
i = 1, n;
j = 1, n;
i = j;
∂Ci = 1; ∂Ci
i = 1, n.
(25)
Consequently, despite the complexity, the general divergence problem has a global optimal solution determined by condition (25). This condition means that when plotting a course, the divergence distance between the “i-th” and “j-th” vessel must always ensure that the specified risk of both vessels is maintained. The target hazard ellipse of one vessel shall not intersect the target hazard ellipse of another vessel. Thus, the laying is carried out according to the criterion of minimum costs, provided that it is optimal (25). This formulation of the question is beneficial from the point of view of forecasting the actions of the courts. Since in this case each of the participants in the operation reaches the optimum, assuming the convexity of the integrands, it is practically assumed that the actions of the navigators are reasonable. For autonomous ships, where there is no “human” factor, one can always count on the “reasonableness” of the actions of the automatic control system (ACS).
4
Experiment, Results and Discussion
Modeling was carried out in MATLAB environment and on the Navi Trainer 5000 navigation simulator [1,4]. Knowing the conditions for the optimal solution of the discrepancy problem, we can consider the algorithm for plotting a course for the automatic system. Since the ACS system is a link of the artificial intelligence of an autonomous vessel and eliminates the risks of the “human” factor, the machine moves the vessel into the risk field. Figure 2 shows a diagram of the construction of the risk field.
Fig. 2. Scheme for constructing the risk field
Critical risk (position 1, Fig. 2) defines the unacceptable positions, and the acceptable risk (position 2, Fig. 2), defines the areas with acceptable but undesirable positions and the field of specified risks (position 3, Fig. 2) defines the area
260
P. Mamenko et al.
of the trace, position 4 The route itself is carried out for reasons of minimum costs when passing the route. In reality, there are standard solutions (algorithms) for laying a route for a route. However, the automatic system must have a criterion that determines the freedom of decision, otherwise the discrepancy problem requires the participation of a person who evaluates the risk of the decision made. Thus, the risk field ensures the “reasonableness” of the ACS action. The route has been set, the schedules have been agreed upon, but a risk field is needed to make a decision. The task of building a risk field at the modern level of Internet technologies is not difficult. A collection of electronic navigation charts (ENC) as well as hydrometeorological information of navigation areas is loaded into the network, you just need to highlight the risk level lines. Here is the question about the companies that sell electronic cards (TRANSAS, C-MAP, NOAA). The second step in solving the route optimization problem is taking into account the environment at the current time. This is done by analyzing the schedules of the movement of ships, radar and optical fields, determining the coordinates of oncoming ships and their maneuvers. This operation can also be obtained by exchanging information with other vessels containing, in addition to standard data, the maximum risk and variance along the axes of risk distribution. To this information, you can add the ship’s coordinates, speed, heading and maneuver. This simple message facilitates the task of diverging ships. While this is not the case, we will consider the radar assessment of the situation to be consistent. Now you can perform the next step - forecasting the development of the operation. At this stage, the trajectory of own ship is adjusted, taking into account the possible risks of entry into the risk area of other ships, Fig. 3.
Fig. 3. Correction of your own course based on the results of the forecast of the development of the situation
For the selected trajectory 2, from the intersecting trajectories of other vessels 4, critical trajectories 1 are determined. When a critical trajectory is detected,
Solution of the Problem of Optimizing Route
261
a divergence maneuver 5 can be planned within the field of permissible risks 6. At the same time, the intersections of the trajectories are not critical if at the moment of crossing the distance between the vessels does not violate the zones of the given risk. The course correction is influenced only by critical trajectories, this is the standard plotting of the course [10–12,16,17] with a restriction on exiting the risk corridor 6. This operation of the course correction is current, which guarantees control over the situation. However, in spite of the optimality of the performed corrections, situations are possible when it is necessary to quickly carry out the operations of diverging vessels. In this case, either the problem of sliding along the line of equal risk of the target vessel is solved, or, in the case of movement between the vessels, a discrepancy in the minimum of the risk gradient is performed. In both cases, the course is plotted taking into account the preservation of the area of the given risk. Situations are possible when the risk increases. For example, when mooring or bunkering a vessel on the move, the risk increases, which is inevitable in the essence of the operation, and here the speed regime changes, in contrast to the divergence in the open sea, where a change in the vessel’s speed is undesirable. Figure 4 shows the optimal control algorithm in the general navigation problem. This algorithm consists of: – block 1 for setting goals, in which the points and times of the beginning and end of the movement are determined; – Block 2 of the formation of the risk field, which uses the risk base and data from satellite navigation and electronic cartography; – block 3 plotting the course according to the criterion of minimum costs. This operation does not require operator participation, as there are clear criteria and limitations; – block 4 search for critical trajectories; – block 5 of elimination of critical trajectories; – block 6 for comparison of risks. If risks are found that are higher than acceptable, go to block 7; – block 7 of correction of trajectories and graphs of their movement. If the risks are less than permissible, go to block 8 for the analysis of the situation and then to block 9 for checking the criticality; – block 8 of the analysis of the situation, which uses the information of the radar and other radio navigation equipment; – block 9 for checking the criticality based on the data of block 8 and visual data; – block 10 for comparison of risks. If the risk does not exceed the permissible, go to block 13 for performing the maneuver. Otherwise, go to block 11 for analyzing the situation. If the number of vessels with a critical trajectory is one, then go to block 12 to diverge by sliding along the lines of a given risk. If the number of vessels is more than one, then go to block 14 of the gradient divergence. The second part of the algorithm (blocks 8–15) is executed continuously until the end point of the route. The principal difference
262
P. Mamenko et al.
Fig. 4. Algorithm of optimal control in the general problem of navigation
of the considered algorithm is its optimality in terms of the risk criterion and compatibility with use for autonomous ships. In Fig. 5 shows the results of mathematical modeling of the processes of divergence of ships. In Fig. 5a shows the divergence trajectory 5 with one vessel, built for the case of no intersection of the zones of the given risk 3, 6. In this case, the sliding trajectory 2 is repeated with an offset to the minor axis of the self-risk ellipse. In Fig. 5b shows the results of mathematical modeling of divergence processes with several vessels. In this situation, the intersection of the lines of the given
Solution of the Problem of Optimizing Route
263
Fig. 5. Results of mathematical modeling of ship divergence processes
risks 2 and 4 is possible. To ensure the optimal divergence, in this case, the movement is organized along the minimum of the gradient. The issues of solving the route optimization problem using the risk criterion are considered. The problem of optimal control of a system of dynamic objects with a vector - functional was posed and solved; optimality of control is achieved due to optimization of the vector - functional on the entire trajectory of motion; optimal control problem with a vector - functional, when fulfilling the hypothesis about the convexity of the integrands of the vector – functional components, is solved using the calculus of variations; as a result of solving the optimal control problem, a simple algorithm was obtained for constructing optimal trajectories of the vessel during the avoid collision maneuver; an algorithm for constructing the optimal trajectory of the vessel’s movement using risk fields has been obtained; mathematical modeling of collision avoidance processes with one or several vessels was carried out using the risk criterion.
5
Conclusions
The paper considers and resolves the issues of constructing an optimal trajectory using the risk criterion. The scientific novelty of the obtained results consists in the fact that for the first time a method, algorithmic and software for an automatic control system were developed that solve the problem of optimal routing and optimal divergence with one or more targets using the risk criterion. This is achieved due to the constant, with the clock cycle of the on-board controller, measuring the parameters of the vessel’s movement, using them for solving the problem of minimizing the risk functional vector for constructing a field of risk levels, forming control for divergence along a given risk level line. The practical value of the obtained results is that the developed method and algorithms are implemented in software and investigated by solving the problem in a fully automatic mode in a closed circuit with the control objects in a MATLAB environment for various types of vessels, targets, navigation areas
264
P. Mamenko et al.
and weather conditions. The experiment confirmed the operability and efficiency of the method, algorithmic and software, which makes it possible to recommend them for the development of mathematical support of the automatic vessel movement control systems to solve the problems of optimal routing and optimal divergence using the risk criterion.
References 1. Navi-trainer 5000 (version 5.35): Instructor Manual. Transas MIP Ltd. (2014) 2. Avakov, E.R., Magaril, G.G., Tikhomirov, V.M.: Lagrange’s principle in extremum problems with constraints. Russ. Acad. Sci. 68(3) (2013) 3. Baba, N., Jain, L.: Computational Intelligence in Games. Physica-Verlag, New York (2001) 4. Chaturvedi, D.: Modeling and Simulation of Systems Using MATLAB and Simulink. Tailor and Francis Group, LLC, Abingdon (2011) 5. Engwerda, J.: LQ Dynamic Optimization and Differential Games, pp. 359–426. Wiley, West Sussex (2005) 6. Ghosh, D., Singh, A., Shukla, K., Manchanda, K.: Extended Karush-Kuhn-Tucker condition for constrained interval optimization problems and its application in support vector machines. https://doi.org/10.1016/j.ins.2019.07.017 7. Harold, P., Benson: Multi-objective optimization: Pareto optimal solutions. https://doi.org/10.1007/0-306-48332-7-315 8. LaValle, S.: Planning Algorithms, pp. 357–586. JCambridge University Press, New York (2006) 9. Liang, S., Zeng, X., Hong, Y.: Distributed nonsmooth optimization with coupled inequality constraints via modified lagrangian function. IEEE Trans. Autom. Control 63(6) (2018). https://doi.org/10.1109/TAC.2017.2752001 10. Lisowski, J.: A Ship as a Object for an Automatic Control. Wydawnictwo Morskie, Gdansk (1981) 11. Lisowski, J.: Ship’s Anticollision Systems. Wydawnictwo Morskie, Gdansk (1981) 12. Lisowski, J.: The analysis of differential game models of safe ship’s control process. J. Shanghai Maritime Inst. 1, 25–38 (1985) 13. Lisowski, J.: Game control methods in navigator decision support system. J. Arch. Transp. 17, 133–147 (2005) 14. Lisowski, J.: The dynamic game theory methods applied to ship control with minimum risk of collision. In: Risk Analysis VI, vol. 17, pp. 293–302. Computational Mechanics Publications, Southampton (2006) 15. Lisowski, J.: Application of dynamic game and neural network in safe ship control. Pol. J. Environ. Stud. 16(48), 114–120 (2007) 16. Lisowski, J., Pham, N.: Properties of fuzzy-probability sets in safe navigation. In: CAMS 1992, Workshop IFAC, Genova, pp. 209–219 (1992) 17. Morawski, L.: Methods of synthesis of systems of steering the ship’s movement along a predetermined trajectory. Sci. J. Gdynia Maritime Acad. (1994) 18. Nosov, P., Cherniavskyi, V., Zinchenko, S., Popovych, I., Nahrybelnyi, Y., Nosova, H.: Identification of marine emergency response of electronic navigation operator. Radio Electron. Comput. Sci. Control (1), 208–223 (2021). https://doi.org/10. 15588/1607-3274-2021-1-20
Solution of the Problem of Optimizing Route
265
19. Nosov, P., Popovych, I., Cherniavskyi, V., Zinchenko, S., Prokopchuk, Y., Makarchuk, D.: Automated identification of an operator anticipation on marine transport. Radio Electron. Comput. Sci. Control (3), 158–172 (2020). https://doi. org/10.15588/1607-3274-2020-3-15 20. Pontryagin, L., Boltayanskii, V., Gamkrelidze, R., Mishchenko, E.F.: The Mathematical Theory of Optimal Processes. Wiley, Hoboken (1962) 21. Zinchenko, S., Ben, A., Nosov, P., Popovych, I., Mamenko, P., Mateychuk, V.: Improving the accuracy and reliability of automatic vessel motion control systems. Radio Electron. Comput. Sci. Control 2, 183–195 (2020). https://doi.org/ 10.15588/1607-3274-2020-2-19 22. Zinchenko, S., Ben, A., Nosov, P., Popovych, I., Mateichuk, V., Grosheva, O.: The vessel movement optimisation with excessive control. Bull. Univ. Karaganda 3(99), 86–96 (2020). https://doi.org/10.31489/2020Ph3/86-96 23. Zinchenko, S., et al.: Use of simulator equipment for the development and testing of vessel control systems. Electr. Control. Commun. Eng. 16(2), 58–64 (2020). https://doi.org/10.2478/ecce-2020-0009 24. Zinchenko, S., Nosov, P., Mateichuk, V., Mamenko, P., Grosheva, O.: Automatic collision avoidance with multiple targets, including maneuvering ones. Radio Electron. Comput. Sci. Control (4), 211–221 (2019). https://doi.org/10.15588/16073274-2019-4-20
Automatic Optimal Control of a Vessel with Redundant Structure of Executive Devices Serhii Zinchenko1(B) , Oleh Tovstokoryi1 , Andrii Ben1 , Pavlo Nosov1 , Ihor Popovych2 , and Yaroslav Nahrybelnyi1 1
Kherson State Maritime Academy, Kherson, Ukraine {srz56,pason,yar1507}@ukr.net, a [email protected] 2 Kherson State University, Kherson, Ukraine
Abstract. The article considers the issues of automatic control of the vessel movement with a redundant control structure. Redundant structures are now widely used on all vessels with a dynamic positioning system to improve control efficiency (accuracy, maneuverability, reduce energy consumption and emissions), reliability and environmental safety. A brief review of the literature on the use of redundant structures to improve control efficiency is made. In open sources, the authors have not found solutions that improve the efficiency of the control by using redundant structures of actuators. Therefore, it was concluded that the development of such systems is relevant. Several schemes for splitting control into executive devices of a redundant structure, including an optimal splitting scheme, are considered. A comparative analysis of the considered splitting schemes with the optimal one is carried out. Comparative analysis showed that the use of optimal control of the redundant structure of actuators allows increasing the accuracy of dynamic positioning by (20–40)%, depending on the direction of the created control, as well as reducing fuel consumption by (30–100)%, which determines its advantages over known solutions. The mathematical and software support for an automatic optimal control system with redundant control has been developed. The operability and efficiency of the mathematical and software support were tested in a closed circuit with a control object in the MATLAB environment. The conducted experiments confirmed the operability and efficiency of the developed method, algorithms and software and allow to recommend them for practical use in the development of vessel control systems with redundant control structures. Keywords: Redundant control structures · Optimal control scheme · Control quality criterion · Mathematical models
1
· Splitting
Introduction
Currently a large number of vessels such as Platform supply vessel (PSV)/ Offshore Support Vessel (OSV), Diving Support (DSV’s) and ROV Support c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 266–281, 2022. https://doi.org/10.1007/978-3-030-82014-5_18
Automatic Optimal Control of a Vessel with Redundant Structure
267
Vessels, Drill Ships, Cable Lay and Repair Vessels, Pipe Laying Ships, Dredgers, Crane Barge or Crane Vessel, Rock Dumping Vessels, Passenger Vessels, Specialist - Semi-submersible Heavy-Lift Vessels, Mobile Offshore Drilling Units/Ships (MODUs), Shuttle Tanker, Naval Vessels and Operations [1], operate under risk conditions, therefore there are increased requirements for reliability, accuracy and maneuverability. To meet these requirements, the control systems such vessels, which is called a dynamic positioning system (DP-system), are equipped with high-precision measuring devices that allow determining with high accuracy the absolute position of the vessel (DGPS systems), or the position relative to another object (Reference systems), redundant control structures that ensure reliability in control, on-board computer complex and software for the automation of control processes [3–7,11,20,21]. These vessels have the greatest degree of control processes automation in order to minimize the influence of the human factor. Human factor is the weakest link in the vessel control system [18,19,23,25,26]. Manual control of the vessel is extremely suboptimal, it can lead to unacceptable deviations of the controlled parameters, increased fuel consumption, increased loads on the hull and even destruction of the hull in a storm. The work [8,9,17] is devoted to the measurement of loads. The issues of improving control efficiency through the use of automated systems have also been considered by the authors earlier. So, in article [29] there were considered the issues of increasing reliability due to automatic detection and parrying of failures, in articles [33] there were considered the issues of automatic divergence with many targets, including maneuvering ones and in article [32] there were considered the issues of increasing the control accuracy due to the use of the meter mathematical model in on-board controller. Control redundancy is typically used to improve reliability. At the same time, redundancy in control can also be used to increase the efficiency of the control system [30]. To ensure threedimensional controllability simultaneously in the channels of longitudinal, lateral and rotational movements, the minimum required number of independent controls should be U = 3. At the same time, on many transport vessels the number of independent controls is U = 3 (the angle of the telegraph and the angle of the stern rudder). On such vessels, one stern rudder is used for sequential development of lateral and angular deviations (first, lateral deviation is worked out by changing the course, then the course itself is worked out). In the presence of external influences, such vessels move along a trajectory with a drift angle, which leads to additional fuel consumption. The use of schemes with sufficient U = 3 control already makes it possible to increase the reliability (due to the use of an additional rudder) and the quality of control (due to the possibility of keeping the vessel on the route with a zero drift angle, reducing the hydrodynamic drag, saving fuel, reducing emissions, preservation of the environment). The object of the research is the processes of the vessel automatic optimal control with redundant structure of executive devices. The subject of the research is the method, algorithms and software of the vessel automatic optimal control system with redundant structure of executive devices.
268
S. Zinchenko et al.
The purpose of the research is to improve the efficiency of automatic vessel control with a redundant structure of executive devices.
2
Problem Statement
Figure 1 shows a control scheme of the considered redundant structure.
Fig. 1. Control scheme of the considered redundant structure
It is required to find such control parameters P1 , α1 , P2 , α2 , P3 , that would ensure the optimization of the control quality function (1) in the presence of control constraints (2) and (3). Q(P1 , P2 , P3 ) ⇒ opt.
(1)
= fu (P1 , α1 , P2 , α2 , P3 ) U
(2)
|P1 | ≤ P1max , |P2 | ≤ P2max , |P3 | ≤ P3max , |α1 | ≤ π, |α2 | ≤ π,
(3)
(Px , Py , Mz ) is the vector of required where Q(•) is the quality control function, U control forces and moments in control channels, fu (•) is the mathematical model of the control structure, P1 , P2 is the thrust force of the screw of the first and second ACD, respectively, α1 , α2 is the rotation angle of the first and second ACD respectively, P3 is the bow thruster force.
3
Literature Review
The article [27] discusses the issues of restoring the operability of the tracking system, in the event of random undefined failures of the executive mechanisms, using backup drives. Random undefined failures of actuators, the time of failure,
Automatic Optimal Control of a Vessel with Redundant Structure
269
the nature and values of which may not be known, pose serious problems in the design of feedback control, since such failures can introduce large structural and parametric uncertainties. An overview of tracking systems with adaptive compensation of failures is given, which allow efficient use of redundancy. Methods for solving these problems are proposed, based on the use of direct or indirect approaches of adaptive control for direct adaptive compensation of drives failures without explicit detection of failure, for fast and effective restoration of the system’s performance. In [12], the issues of parrying failures in an active radial magnetic bearing tightly connected to a redundant support structure are considered. A strategy of fault tolerance control by reconfiguring the magnetic flux in order to keep the bearing force constant is proposed. Using the FSS (Fault State Series) to describe the fault condition of the drives, a current sharing index rule has been developed to address various fault conditions. A fault-tolerant control model has been created to test the generation of electromagnetic force in rigidly coupled standby support structures after failure of the actuators. The simulation results showed that the fault-tolerant control strategy makes it possible to stabilize the rotor rotation in the event of failure of some of the actuators. The article [14] discusses the issues of creating a fault-tolerant steering system to improve the reliability of unmanned underwater vehicles. To implement faulttolerant control, redundancy control strategies and algorithms are used. The analysis carried out by the authors showed that the reliability of the control system, in which the strategy and redundancy control algorithms are used, is significantly better than the traditional configuration. The article [16] explores methods for control redundancy of electro-hydraulic drives based on fuzzy aggregation, Mamdani fuzzy logic rules and the theory of fuzzy neural networks. Fault identification and isolation as well as system recovery are performed by combining fuzzy clustering with Mamdani fuzzy control, fuzzy neural network and redundancy control. The methods proposed in the article allow solving the problem of erroneous judgments, as well as avoiding undefined states in the system. The article [22] explores a new approach to the distribution of the forces of an autonomous underwater vehicle (AUV) engine. Typically, the number of actuators in the AUV is more than the minimum required to achieve the required movement. The possibilities of using an excessive number of actuators to parry faults during operation are studied. The scheme for the resolution of redundancy is presented, which allows the formation of the necessary support forces to create the desired movement. These support forces are used by the on-board AUV controller to create the required motion. The results of computer simulation confirm the efficiency of the proposed scheme. The article [15] considers the issues of unloading the excess structure of the spacecraft flywheels in the Earth’s magnetic field. A key feature of the work is the use of arbitrary parameters in the general solution of an indefinite system of linear algebraic equations as additional control parameters. For the minimally redundant flywheel structure and magnetic moments of the unloading system,
270
S. Zinchenko et al.
control algorithms are synthesized that provide asymptotic stability of the solution of model equations describing the motion of the flywheel. The performance of the proposed algorithms and the features of the process of unloading the flywheels are investigated by the example of the controlled motion of a spacecraft while maintaining a three-axis orbital orientation. In [13], the issues of planning the movement of the welding torch are considered. The angular redundancy that exists during the welding process is taken into account to plan and optimize the welding torch path by minimizing the angular cost of the torch. Some strategies for improving the efficiency of the proposed method are also considered, such as the heuristic sampling strategy, which is used to control scheduling, the collision checking strategy, which is used to improve the efficiency of collision checking. The proposed method is very effective in solving complex problems of motion planning, for example, in a welding environment, where the weld is located in various difficult environments. The results of the performed experiments showed that the proposed method can find not only a possible collision-free path, but also optimize the angle of the torch burner with an increase in the number of iterations. The work [2] provides recommendations for practical maneuvering of a vessel with two stern ACDs. Recommended controls for implementation of several fixed modes are considered: sailing slow ahead, sailing full ahead, sailing slow astern, sailing full astern, turning to port, turning to starboard, turning the stern to port, turning the stern to starboard, normal stopping, emergency crash stop, turning on the spot to port, turning on the spot to starboard, walking the vessel slowly to port, walking the vessel fast to port, walking the vessel slowly to starboard, walking the vessel fast to starboard. Taking into account that these modes are implemented manually, the angles of the ACD setting in all modes, except for the modes walking the vessel fast to port and walking the vessel fast to starboard, are selected as multiples of 45◦ . The article [30] discusses the issues of automatic control of the vessel’s movement using excessive control, which allows to organize the movement of the vessel without a drift angle, to reduce the hydrodynamic resistance and fuel consumption. Issues of reducing energy consumption and fuel economy on board, as well as related issues of reducing emissions and improving the environment are especially relevant at the present time. Mathematical, algorithmic, and software have been developed for an on-board controller simulator of a vessel’s motion control system with excessive control, the operability and efficiency of which has been verified by numerical simulation in a closed circuit with a mathematical model of the control object. The article [31] discusses the issues of mathematical support of the Information and Risk Control System for the offshore vessel operating in high risk areas near oil or gas platforms, other large moving objects. Vessels operating in highrisk areas are equipped with dynamic positioning systems and excessive control, which allows to increase the reliability, maneuverability and quality of control. Minimally excessive control structure with two stern Azimuth Control Devices is considered. To dispensation redundancy, three control splitting algorithms were
Automatic Optimal Control of a Vessel with Redundant Structure
271
considered, analytical expressions for control splitting were obtained. There was carried out a comparative analysis of the considered splitting algorithms between themselves and the prototype according to the minimum - criterion. A comparative analysis showed that the splitting algorithm used in the prototype are special cases of the considered algorithms for dispensation redundancy. Operability and efficiency of the algorithmic and software of the vessel control system operating in high risk areas, verified by mathematical modeling at imitation modeling stand. As can be seen from the above review, control redundancy is mainly used to increase the reliability of actuators [12,14–16,22,27], optimize motion using the example of a welding torch [13], increase maneuverability using the example of a vessel with ACD in manual control mode [2] and automatic control mode [30,31]. In open sources, the authors have not found an automatic control system for the movement of a vessel with a considered redundant structure, which would optimize control. Therefore, the development of such systems is an urgent scientific and technical task.
4
Materials and Methods
The mathematical model fu (•) of the control structure (2), in projections on the axis of the related coordinate system, has the form Px = P1 cos α1 + P2 cos α2 ,
(4)
Py = P1 sin α1 + P2 sin α2 + P3 ,
(5)
Mz = P1 b cos α1 − P2 b cos α2 − P1 a sin α1 − P2 a sin α2 + P3 c.
(6)
As can be seen from Eqs. (4)–(6), for the implementation of control actions Px , Py , Mz in the channels of longitudinal, lateral and rotational motions, respectively, there are five control parameters P1 , α1 , P2 , α2 , P3 , that is, the control redundancy for the considered control structure is 5 − 3 = 2. Redundancy in control means the availability of free control parameters that can be used to optimize control processes. Below we will consider optimal controls of the considered redundant structure for the following control quality functions: Q1 (P1 , P2 , P3 ) = P12 + P22 + P32 ⇒ min,
(7)
Quality control function (7) minimizes power consumption. Q2 (P1 , P2 , P3 ) = |Px | ⇒ max,
(8)
The quality control function (8) implements the maximum control action in the positive or negative directions of the axis OX1 , respectively, which allows to create the maximum speed of longitudinal movement and reduce the time for longitudinal movement. Q2 (P1 , P2 , P3 ) = |Py | ⇒ max,
(9)
272
S. Zinchenko et al.
The control quality function (9) implements the maximum control action in the positive or negative directions of the axis OY1 , respectively, which makes it possible to create the maximum speed of lateral movement and reduce the time for lateral movement. Q2 (P1 , P2 , P3 ) = |Mz | ⇒ max,
(10)
The control quality function (9) implements the maximum control torque around the axis OZ1 in the positive or negative directions, respectively, which allows to create the maximum angular rotation speed in the yaw channel and reduce the turnaround time. Unfortunately, it is not possible to obtain an analytical solution to the considered optimization problem. Therefore, further study of the structure was carried out by numerical methods in the MATLAB environment. For this, the numerical optimization procedure optimtool of the Optimization Toolbox library was used. The results of optimization of the control quality function Q1 = P12 +P22 +P32 are presented in Table 1. for various values of the vector of control actions U Table 1. Optimization results of the control quality function Q1 = P12 + P22 + P32 Num U
P1
P2
α1
α2
P3
Q1 0,5
1
(1;0;0)
0,5
0,5
0
0
0
2
(0,866;0,5;0)
0,465
0,438
15,87
16,73
0,247 0,47
3
(0,5;0,866;0)
0,35
0,315
38,79
43,95
0,428 0,4
4
(0;1;0)
0,255
0,254
83,94
96,09
0,494 0,37
5
(−0.5;0,866;0) 0,316
0,35
136,09
141,19 0,428 0,41
6
(−0,866;0,5;0) 0,433
−0,5
−179,92 −30,08 0,25
0,5
7
(−1,0,0)
0,5
−0,5
−179,92 −0,11
0,5
8
(0,0,1)
−0,006 −0,007 83,03
96,67
0
0,013 0,000254
The results of optimization of the control quality functions Q2 = Px , Q2 = Py , Q2 = Mz are presente in Table 2. The table also shows the values of the function Q1 , which can be used to estimate the energy consumption of the structure . for the formation of the control vector U To compare the obtained characteristics of the optimal control scheme with the characteristics of other control schemes, several other schemes should be considered. Equal-vectorus control scheme. To implement this scheme, we use two additional constraint equations P2 = P1 and α2 = α1 (the vector of the screw force ACD1 is equal to the vector of the screw force ACD2). Taking into account additional constraint equations, the system of Eqs. (4)–(6) takes the form Px = P1 cos α1 + P1 cos α1 = 2P1 cos α1
(11)
Automatic Optimal Control of a Vessel with Redundant Structure
273
Table 2. Optimization results of the control quality functions Q2 = Px , Q2 = Py , Q2 = Mz Num U
P1
P2
α1
α2
P3
1
(Px ⇒ max;0;0)
1,00
1,00
0,00
0,00
0,00 2,00
2,00
2
(Px ⇒ min;0;0)
−1,00 −1,00 0,00
0,00
0,00 −2,00
2,00
3
(Py ⇒ max;0;0)
1,00
1,00
26,76
153,28
0,5
2,25
4
(Py ⇒ min;0;0)
1,00
1,00
−26,76 −153,28 −0,5 −1,4
5
(Mz ⇒ max;0;0) 1,00
6
(Mz ⇒ min;0;0) −1,00 0,865 −30,08 0,00
0,865 −30,08 179,92
0,5
Q2
1,4 55,77
Q1
2,25 1,998
−0,5 −55,77 1,998
Py = P1 sin α1 + P1 sin α1 = 2P1 sin α1 + P3
(12)
Mz = P1 b cos α1 − P1 b cos α1 − P1 a sin α1 − P1 a sin α1 + P3 c
(13)
= −2P1 a sin α1 + P3 c. From Eq. (13), taking into account Eqs. (11) and (12), we find Mz = −Py a + P3 (a + c), where do we find P3 =
Mz + P y a . (a + c)
(14)
After dividing Eq. (12) by Eq. (11), we find α1 = arctan(
Py − P3 ), Px
α1 = α2 .
(15) (16)
From Eq. (11), or Eq. (12), we find P1 =
Px , 2 cos α1
(17)
P1 =
Py − P3 , 2 sin α1
(18)
P2 = P1 .
(19)
Equations (14)–(19) make it possible to determine the control parameters = (Px , Py , Mz ). that implement the vector of the required control action U Table 3 shows the control parameters and functions Q1 = P12 + P 2 2 + P 3 2 for the equal-vectorus control. Equal-modulus control scheme with orthogonal vectors. In this case, the additional constraint equations have the form P2 = P1 , α2 = α1 + π2 (the force of the
274
S. Zinchenko et al. Table 3. Control parameters and functions Q1 for the equal-vectorus control Num U
P1
P2
α1
α2
P3
Q1
1
(1;0;0)
0,5
0,5
0
0
0
0,5
2
(0,866;0,5;0)
0,45
0,45
16,10
16,10
0,25
0,468
3
(0,5;0,866;0)
0,33
0,33
40,89
40,89
0,433
0,406
4
(0;1;0)
0,25
0,25
90,00
90,00
0,50
5
(−0.5;0,866;0) −0,33
−0,33
−40,90 −40,90 0,433
0,40
6
(−0,866;0,5;0) −0,45
−0,45
−16,10 −16,10 0,25
0,469
7
(−1,0,0)
−0,5
−0,5
0,00
0,5
8
(0,0,1)
0,00622 0,00622 −90,00 −90,00 0,0124 0,000232
0,00
0,00
0,375
ACD1 screw is equal to the ACD2 screw force in magnitude and is perpendicular to it). Taking into account additional constraint equations, the system of Eqs. (4)– (6) can be represented in the form Px = P1 cos α1 − P1 sin α1 ,
(20)
Py = P1 sin α1 + P1 cos α1 + P3 ,
(21)
Mz = P1 cos α1 (b − a) + P1 sin α1 (b − a) + P3 c,
(22)
We multiply Eq. (21) by (b-a) and subtract it from Eq. (22) Mz − Py (b − a) = P3 c − P3 (b − a), where do we find P3 =
Mz − Py (b − a) . a−b+c
(23)
Add and subtract Eqs. (20) and (21) Py + Px = 2P1 cos α1 + P3 , Py − Px = 2P1 sin α1 + P3 . From the last equations we find α1 = arctan(
Py − P3 − Px ). Py − P3 + Px
α2 = α1 +
π . 2
(24) (25)
From Eqs. (20) and (21) we find P1 =
Px , cos α1 − sin α1
(26)
Automatic Optimal Control of a Vessel with Redundant Structure
or P1 =
275
Py − P3 , cos α1 + sin α1
(27)
P2 = P1 .
(28)
Equations (23)–(28) make it possible to determine the control parameters = (Px , Py , Mz ) in that implement the vector of the required control action U equal-modulus control with orthogonal vectors. Table 4 shows the control parameters and functions Q1 = P12 + P 2 2 + P 3 2 for equal-modulus control with orthogonal vectors. Table 4. Control parameters and functions Q1 for the equal-modulus control with orthogonal vectors Num U
P1
P2
α1
P3
Q1
1
(1;0;0)
0,707
0,707
−45,00 45,00
0,00
1,00
2
(0,866;0,5;0)
0,644
0,644
−26,99 63,01
0,218 0,877
3
(0,5;0,866;0)
0,494
0,494
−0,72
89,28
0,378 0,63
4
(0;1;0)
0,398
0,398
45,00
135,00 0,437 0,51
5
(−0.5;0,866;0) −0,494 −0,494 −89,28 0,72
0,378 0,631
6
(−0,866;0,5;0) −0,644 −0,644 −63,01 26,99
0,218 0,877
7
(−1,0,0)
−0,707 −0,707 −45,00 45,00
8
(0,0,1)
0,00
0,00
45,00
α2
0,00
1,00
135,00 0,014 0,000196
Figure 2 shows the quality functions Q1 for three control schemes ( Row 1 optimal control, Row 2 - equal-vectorus control, Row 3 - equal-modulus control with orthogonal vectors). As can be seen from Fig. 2, the function Q1 = P12 + P 2 2 + P 3 2 , obtained for the optimal and equal-vector control, practically coincide for all values of = (Px , Py , Mz ). This means that the power consumption the control action U of the redundant structure under equal-vectorus control is practically equal to the power consumption of the redundant structure under optimal control with a quality function Q1 = P12 + P 2 2 + P 3 2 . At the same time, the function Q1 = P12 + P 2 2 + P 3 2 , obtained for the equalmodulus control with orthogonal vectors scheme is located above the previous functions, which means more power consumption for this control scheme. So, for = (0, 1, 0), the excess of energy consumption, in the vector of control action U comparison with the optimal control scheme, is ΔQ1 = 36, 2%, and for vectors of = (1, 0, 0) and U = (−1, 0, 0) the excess of energy consumption, control actions U in comparison with the optimal control scheme, is ΔQ1 = 100%. Figure 3 shows the quality functions Q2 for three control schemes ( Row 1 optimal control, Row 2 - equal-vectorus control, Row 3 - equal-modulus control with orthogonal vectors).
276
S. Zinchenko et al.
Fig. 2. Quality functions Q1 for three control schemes
As can be seen from the presented results, the function Q2 obtained for the optimal control scheme is greater than the function Q2 obtained for the other two schemes. This means that the optimal control scheme is capable of developing maximum control actions greater than the other two schemes. = (0; Py ⇒ max; 0) (posiThus, the maximum transverse control action U tion 4 in Fig. 4) developed by the optimal control scheme is greater on ΔQ2 = 1, 4 − 1, 15 100% = 21, 7% than the maximum control action developed by the 1, 15 equal-modular control scheme with orthogonal vectors and on 1, 4 − 1, 00 100% = 40% more than the maximum control action develΔQ2 = 1, 00 oped by the equal-vector control scheme. = (Px ⇒ max; 0; 0) (positions The maximum longitudinal control action U 1 and 7 in Fig. 4), created by the optimal control scheme, is larger on Q2 = 2, 0 − 1, 4 100% = 42, 8% than the maximum longitudinal control action created 1, 4 by the equal-modular control scheme with orthogonal vectors and coincides with the maximum longitudinal control action created by the equal-vector control scheme. Thus, in the mode of maintaining maximum positioning accuracy (control quality function Q2 ), the use of optimal control of the redundant structure will increase the maximum control actions by (21.7−42.8)%, depending on the direction of the created action, and by about the same decrease dynamic positioning error. At the same time, in the fuel economy mode (control quality function
Automatic Optimal Control of a Vessel with Redundant Structure
277
Fig. 3. Quality functions Q2 for three control schemes
Q1 ), the use of optimal control will reduce fuel consumption by (36.2 − 100)%, depending on the direction of creating a control action to compensate for external disturbances.
5
Experiment, Results and Discussion
The problem of finding the optimal control P1 (n), α1 (n), P2 (n), α2 (n), P3 (n) at the n - computation step is reduced to solving the problem of minimizing the control quality function (7) in the fuel saving mode or maximizing the control quality functions (8)–(10) in the mode of maximum positioning accuracy, in the presence of constraints such as equalities (4)–(6) and inequalities (3). This optimization problem should be solved in the on-board controller of the control system in real time, therefore, the time for its solution should not be large and should be placed in on-board controller cycle with the time for solving other tasks. For nonsmooth functions, more complex global optimization methods are used, for example [10,24,28]. In our case the control quality functions (7) or (8)–(10) are smooth, the search for the optimal solution does not present much difficulty and can be carried out in a small number of iterations. To further reduce the search time for the optimal solution P1 (n), α1 (n), P2 (n), α2 (n), P3 (n) at the n− computation step, it is proposed to take the optimal solution P1 (n − 1), α1 (n − 1), P2 (n − 1), α2 (n − 1), P3 (n − 1) from the previous computation step as an initial approximation. To solve the optimization problem, the function b, Aeq, beq, lb, ub, @nonlcon) f mincon(@f un, x0, A,
278
S. Zinchenko et al.
was selected from Optimization Toolbooks library, where @f un is the link to file with optimization function (7) or (8) – (10), x0 is the initial approximation vector, = [] is the matrix of the inequalities type linear constraints system, is A absent, b = [] is the right-hand side vector of the inequality type linear constraints system, is absent, = [] is the matrix of the equality type linear constraints system, is absent, Aeq beq = [] is the right-hand side vector of the equality type linear constraints system, is absent, lb = [−P max , −π, −P max , −π, −P max ] is lower bound vector, 1 2 3 ub = [P1max , π, P2max , π, P3max ] is upper bound vector, @nonlcon is the link to a file of the equalities type nonlinear constraints (4)–(6). Figure 4 shows the results of mathematical modeling of dynamic positioning processes in the MATLAB environment in the form of graphs in time of the state vector parameters:
Fig. 4. Results of mathematical modeling of dynamic positioning processes
longitudinal speed Vx , longitudinal displacement Xg , lateral speed Vy , lateral displacement Yg , angular rate in the roll channel ωx , roll angle ϕ, yaw rate ωz and yaw angle ψ. The blue graphs correspond to the optimal control with the control quality function Q2 , and the red graphs correspond to the equal-module control with orthogonal vectors. In the time interval (0 − 20) s, a gust of wind acts on the vessel at a speed of 20 m/s at an angle of 45◦ to the diametrical plane, which leads to deviations of the parameters of the state vector from their programmed values. Moreover, the deviation ΔXg = 0, 6 m, ΔYg = −2, 0 m for optimal control and ΔXg = 1, 0 m, ΔYg = −3, 0 m for equal-modulus control
Automatic Optimal Control of a Vessel with Redundant Structure
279
with orthogonal vectors. Thus, the results of mathematical modeling confirm an increase in the dynamic positioning accuracy in the longitudinal channel on δx = 1 − 0, 6 −2 + 3 100% = 40% and in the transverse channel on δy = 100% = 33% 1 3 when using the optimal control.
6
Conclusions
The article considers the issues of automatic control of the vessel movement with a redundant control structure. Redundant structures are now widely used on all vessels with a dynamic positioning system to improve control efficiency (accuracy, maneuverability, reduce energy consumption and emissions), reliability and environmental safety. The Scientific Novelty of the obtained results is that for the first time theoretically substantiated design features of the original automatic system of optimal control of vessel movement with redundant structure of actuators, which are constant, with the on-board controller, automatic measurement of vessel parameters, automatic determination of deviations program values, automatic = (Px , Py , Mz ) to determination of the required control forces and torque U compensate for deviations, automatic determination of optimal control parameters P1 , α1 , P2 , α2 , P3 that provide the required control forces and torque = (Px , Py , Mz ) and optimization of a given control quality function, autoU matic implementation of certain optimal control parameters P1 , α1 , P2 , α2 , P3 and provide fundamentally new technical characteristics: positioning of the vessel with optimization of the control quality function, which allows to increase the dynamic positioning accuracy by (21.7 − 42.8)% and reduce fuel consumption by (36.2 − 100)%, which determines its advantages over known solutions. The Practical Significance of the obtained results is that the development and implementation in industrial production of the original device - Automatic optimal control system of the vessel with redundant structure of actuators and regulatory documentation on it will provide automatic optimal control of the vessel with redundant structure of actuators, increase dynamic accuracy positioning and reduce fuel consumption. Further research may involve the transfer of redundant structures to other control quality functions without disturbances.
References 1. English for the navy. https://more-angl.ru/morskoe-sudno/14-tipov-sudov-ssistemoj-dinamicheskogo-pozitsionirovaniya 2. Piloting vessels fitted with azimuthing control devices (ACD’s). United Kingdom Maritime Pilot’s Association (UKMPA Transport House London). https://www. impahq.org/admin/resources/article1367420271.pdf 3. Guidelines for vessels and units with dynamic positioning (DP) systems. IMO. MSC.1/Circ. 1580 (2017)
280
S. Zinchenko et al.
4. Review of the use of the fan beam laser system for dynamic positioning. The International Marine Contractors Association. IMCA Marine Division (2017) 5. The safe operation of dynamically positioned offshore supply vessels. The International Marine Contractors Association. IMCA Marine Division (2018) 6. IMCA marine division: Guidelines for the design and operation of dynamically positioned vessels. The International Marine Contractors Association (2019) 7. Requirements for vessels and units with dynamic positioning (DP) system. Polish Register of Shipping (2019) 8. Babichev, S., Durnyak, B., Sharko, O., Sharko, A.: Technique of metals strength properties diagnostics based on the complex use of fuzzy inference system and hybrid neural network. Data Stream Meaning and Processing, pp. 718–723 (2020). https://www.springerprofessional.de/en/technique-of-metalsstrength-properties-diagnostics-based-on-the/18555646 9. Babichev, S., Sharko, O., Sharko, A., Mikhalyov, O.: Soft filtering of acoustic emission signals based on the complex use of Huang transform and wavelet analysis. In book: Lecture Notes in Computational Intelligence and Decision Making, pp. 718–723 (2020). https://doi.org/10.1007/978-3-030-26474-1-1 10. Barman, R.: Optimization of ship steering control system using genetic algorithms. Department of Ocean Engineering and Naval architecture Indian Institute of Technology (2020) 11. Bray, D., Daniels, J., Fiander, G., Foster, D.: DP operator’s handbook. The Nautical Institute (2020) 12. Cheng, X., Liu, H., Song, S., Hu, Y., Wang, B., Li, Y.: Reconfiguration of tightlycoupled redundant supporting structure in active magnetic bearings under the failures of electromagnetic actuators. Int. J. Appl. Electromagnet. Mech. 54(3), 421–432 (2014). https://doi.org/10.3233/JAE-160113 13. Gao, W., Tang, Q., Yao, J., Yang, Y.: Automatic motion planning for complex welding problems by considering angular redundancy. Robot. Comput. Integrat. Manufact. 62, 613–620 (2020). https://doi.org/10.1016/j.rcim.2019.101862 14. Huang, W., Xu, H., Wang, J., Miao, C., Ren, Y., Wang, L.: Redundancy management for fault-tolerant control system of an unmanned underwater vehicle. In: 5th International Conference on Automation, Control and Robotics Engineering (CACRE 2020) (2020). https://doi.org/10.1109/CACRE50138.2020.9230038 15. Lebedev, D.V.: Momentum unloading excessive reaction-wheel system of a spacecraft. J. Comput. Syst. Sci. Int. 47(4), 613–620 (2008) 16. Li, W., Shi, G.: Redundancy management strategy for electro-hydraulic actuators based on intelligent algorithms. In: Advances in Mechanical Engineering (2020). https://doi.org/10.1177/1687814020930455 17. Marasanov, V., Sharko, O., Sharko, A., Stepanchikov, D.: Modeling of energy spectrum of acoustic-emission signals in dynamic deformation processes of medium with microstructure. In: Proceeding of the 2019 IEEE 39th International Conference on Electronics and Nanotechnology (ELNANO 2019), pp. 718–723 (2019). https:// doi.org/10.1109/ELNANO.2019.8783809 18. Nosov, P., Cherniavskyi, V., Zinchenko, S., Popovych, I., Nahrybelnyi, Y., Nosova, H.: Identification of marine emergency response of electronic navigation operator. Radio Electron. Comput. Sci. Control 1, 208–223 (2021). https://doi.org/10. 15588/1607-3274-2021-1-20 19. Nosov, P., Popovich, I., Cherniavskyi, V., Zinchenko, S., Prokopchuk, Y., Makarchuk, D.: Automated identification of an operator anticipation on marine transport. Radio Electron. Comput. Sci. Control 3, 158–172 (2020). https://doi. org/10.15588/1607-3274-2020-3-15
Automatic Optimal Control of a Vessel with Redundant Structure
281
20. Pakaste, R., Laukia, K., Wilhelmson, M., Kuuskoski, J.: Experience with R systems on board marine vessels. Marine Propulsion (2017). azipodpropulsion https://pdfs.semanticscholar.org/4956/a88815fe21a86883277042d8cd304a35efc5. pdf 21. Perez, T.: Dynamic positioning marine manoeuvring (2017). https://doi.org/10. 1002/9781118476406.emoe110 22. Podder, T.K., Sarkar, N.: Fault - tolerant control of an autonomous underwater vehicle under truster redundancy. Robot. Auton. Syst. 34(1), 39–52 (2001) 23. Popovych, I., et al.: Constructing a structural-functional model of social expectations of the personality. Revista Inclusiones 7(Numero Especial), 154–167 (2020). http://ekhsuir.kspu.edu/handle/123456789/10471 24. Satnam, K., Lalit, K.A., Sangal, A.L., Gaurav, D.: Tunicate swarm algorithm: a new bio-inspired based metaheuristic paradigm for global optimization. In: Engineering Applications of Artificial Intelligence, p. 90 (2020). https://doi.org/10. 1016/j.engappai.2020.103541 25. Shevchenko, R., et al.: Research of psychophysiological features of response to stress situations by future sailors. Revista Inclusiones 7(Numero Especial), 566– 579 (2020). http://ekhsuir.kspu.edu/handle/123456789/12273 26. Shevchenko, R., et al.: Comparative analysis of emotional personality traits of the students of maritime science majors caused by long-term staying at sea. Revista Inclusiones 7(Especial), 538–554 (2020). http://ekhsuir.kspu.edu/xmlui/handle/ 123456789/11874 27. Tao, G.: Direct adaptive actuator failure compensation control: a tutorial. J. Control Decis. 1(1), 75–101 (2014). https://doi.org/10.1080/23307706.2014.885292 28. Wedad, A.S., Abdulqader, M.M.: New caledonian crow learning algorithm: a new metaheuristic algorithm for solving continuous optimization problems. Appl. Soft Comput. (2020). https://doi.org/10.1016/j.asoc.2020.106325 29. Zinchenko, S., Ben, A., Nosov, P., Popovich, I., Mamenko, P., Mateychuk, V.: Improving the accuracy and reliability of automatic vessel motion control systems. Radio Electron. Comput. Sci. Control 2, 183–195 (2020). https://doi.org/ 10.15588/1607-3274-2020-2-19 30. Zinchenko, S., Ben, A., Nosov, P., Popovych, I., Mateichuk, V., Grosheva, O.: The vessel movement optimisation with excessive control. Bulletin of University of Karaganda. Tech. Phys. 3(99), 86–96 (2020). https://doi.org/10.31489/2020Ph3/ 86-96 31. Zinchenko, S., et al.: Use of simulator equipment for the development and testing of vessel control systems. Electr. Control Commun. Eng. 16(2), 58–64 (2020). https:// doi.org/10.2478/ecce-2020-0009 32. Zinchenko, S.M., Mateichuk, V.M., Nosov, P.S., Popovych, I.S., Appazov, E.S.: Improving the accuracy of automatic control with mathematical meter model in onboard controller. Radio Electron. Comput. Sci. Control 2, 197–207 (2020). https:// doi.org/10.15588/1607-3274-2020-4-19 33. Zinchenko, S.M., Nosov, P.S., Mateychuk, V.M., Mamenko, P.P., Grosheva, O.O.: Automatic collision avoidance with multiple targets, including maneuvering ones. Radio Electron. Comput. Sci. Control 4, 211–221 (2019). https://doi.org/10. 15588/1607-3274-2019-4-20
Practice Analysis of Effectiveness Components for the System Functioning Process: Energy Aspect Victor Yarmolenko1(B) , Nataliia Burennikova1 , Sergii Pavlov1 , Vyacheslav Kavetskiy1 , Igor Zavgorodnii1 , Kostiantyn Havrysh1 , and Olga Pinaieva2 1 Vinnytsia National Technical University, Vinnytsia, Ukraine {01559yarmol,n.burennikova,igorzavg,gavrishdpi}@ukr.net, [email protected] 2 Vinnytsia Mykhailo Kotsiubynskyi State Pedagogical University, Vinnytsia, Ukraine
Abstract. The article presents the author’s measuring method for the effectiveness of the functioning processes the system’s components based on the models of the efficiency components in terms of the energy approach and reports the applied aspects of its application. The three types of the rates for measurement effectiveness of the processes functioning system’ components outlined in previously published author’s works, were analyzed. The newest approaches to a method of measurement of newly created authoritative rates for effectiveness of components of process of functioning of system are presented. These approaches are based on the use of the share of profits and cost of processes in total production, and are implemented on the examples of specific industrial enterprises. The technique of approaches differs slightly from the technique used in the examples presented in the previous author’s work, making it more accurate. In this sense, innovation lies in the fact that our approaches solve the problem of simultaneous measurement of the efficiency of the system components (using the effectiveness rates of the three indicated types) regardless of the units of measurement of their total, net production and costs, since all this boils down to a dimensionless unit of measurement of the share of benefits and of the share of costs in their total production, moreover, the average values of the effectiveness rates of the functioning of the components of the system to some extent (approximately) can be considered as characteristics of the corresponding rates for the process of the functioning of the system. The examples show the practical implementation of this technology. Keywords: System components · Energy approach · Energy products of the process of the system functioning · Total, clean, cost and scale products of the process · Rates of the effectiveness for process · Scale, effectiveness and efficiency of the process · Models of components of efficiency of Burennikova (Polishchuk) - Yarmolenko c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 282–296, 2022. https://doi.org/10.1007/978-3-030-82014-5_19
Analysis of Effectiveness Components for the System: Energy Aspect
1
283
Introduction
Modern challenges, contradictions and risks in economics and its management cause the need for theoretical and applied research on the construction of systems and processes of functioning of their individual components both separately and in the relationship and interdependence. Systems function not only on their own accounts, they are correlated and associated with the external environment, they are subsystems for higher-level systems. They act accordingly, change themselves and change their environment. The unstable development of systems at micro (enterprises), meso (regions of the country) and macro levels requires an updated model of such development, focused on qualitative structural changes. Quest of the ways of newest methods of management by activity and development of the systems (in particular, of the enterprises) on the basis of the energy theory of development (in particular, the energy approach) become relevant. Consideration of the concept of energy for process of the functioning of the system in the works of scientists suggests that this energy as the engine of any changes is such a generalized characteristic of the system, which determines its qualitative and quantitative state. From the point of view of the energy approach the theory and practice for measurement of efficiency of functioning of the system which allow to estimate the specified changes are actual. This explains the need to develop and improve methods of the measuring the efficiency of components for process of the functioning of the system based on models of efficiency components in terms of energy approach to management. The aim of the work is presentation of the author’s method for measuring effectiveness of the functioning of the system components on the basis of models of the efficiency components and use of the benefit and cost shares of the processes in their total products with standpoint of the energy approaches and for interpretation of the applied aspects of its application.
2
Problem Statement
For the first time to use for analysis of economic processes the natural sciences were made by Greeks (in particular, by Plato). Further development of this problem occurred in physiocrats in the eighteenth century. In particular, in F. Kahn there is an energy approach to determining the value added. Almost a century later, Ukrainian scientist S. Podolinsky became a follower of physiocrats. The followers of the energy approach are further F. Soddy, N. Gergescu-Rougen and others. New trends are emerging in the economic sciences (in such as econophysics). The use of the energy approach contributes to solving the problem of taking into account the laws of nature in the economy, to developing the methodological synergy of the sciences. Combining physical teaching with the laws of functioning of the economy allows us to move to the newest principles of its development. In accordance with the energy approach, the main condition for the development of the system is the activation of its internal energy. This energy, as the
284
V. Yarmolenko et al.
driving force of any change, determines its quantitative and qualitative states and causes of structural transformations by the spatial and temporal locations of the system elements. From the point of view of the energy approach, the theory and practice for the effectiveness estimation of the process functioning system (in particular, the economy as a complex system) as ability of the system to produce some result based on the use of an appropriate aggregate of rates are relevant. Therefore, in time there is a problem in the form of developing and disclosing a method for measuring the effectiveness of the processes functioning of the system components based on the author’s models of the constituents of the efficiency, using the share of benefits and costs of the processes in their common products, taking into account the energy approach, which we present.
3
Literature Review
Undoubtedly, the energy principles of the economy, perceived by S. Podolinsky, deserve attention [25]. These principles were developed in the researches of [4,11,12,15,16,28,29,35,45] and others. The energetic theory developed by S. Podolinsky was studied by many scientists without reference to it [6,7,13,23,27,30] and others. Designs of others researchers are tightly coupled with searches of these scientists; they studied processes of the development systems [1,9,10,22,33]. The article presented by us is a continuation of the works [37–40] by the authors V. Yarmolenko and N. Burennikova, in fact, it is a review of the theoretical results of these articles, the practical implementation of which is shown according to the data of specific industrial enterprises. Therefore, when presenting the results that we obtained in this article, it became necessary to use some textual and semantic repetitions of these works. Works [17,18,21,32,36] seemed interesting to us in the context of our research from the point of view of methodology for solving problems. Energy as the driver of any change is such a generalized characteristic of a system that determines its qualitative and quantitative state and causes the transformation of its structure by changing the spatial and temporal arrangement of the elements of the system [39] p. 261. From the point of view of the energy approach, the theory and practice for measuring of the force (effectiveness, efficiency) of the processes of functioning of the system components that allow estimation these corresponding changes are relevant. The component of the system, the subsystem (subsystem) is called any element that is part of the system as the set of the simplest parts of an arbitrary nature, indivisible in view of solving a specific problem and naturally related [37], p. 103. A complex system can be divided into components in various ways, depending on the purpose of the study. System components have the properties of the system; ensure the functioning of the system and the existence of its main properties [34]. To measure the efficiency of the processes of functioning of the components of the system and the system as a whole is important not only the structure of system partitioning into components, but also the method of measuring the specified efficiency (both from theoretical and practical points of view); this once again demonstrates the relevance of the research topic.
Analysis of Effectiveness Components for the System: Energy Aspect
285
Modeling any processes and their results requires the use of appropriate metrics. As you know, some scientists when considering such rates use the concept of effectiveness, considering it as a concept identical to the efficiency [14,19,20] and others. Such a vision, as a rule, leads to incorrect or generally misinterpretation of the concepts of both efficiency and effectiveness. Other scientists do not see the efficiency and effectiveness of processes with identical concepts [5,8,24,31,44] and others. The ambiguity of the interpretation of the essence of the categories of efficiency and effectiveness leads to a certain inconsistency in theoretical and applied research and requires a principled view of the meaningful content of these concepts. This also required from authors a deep study of these concepts with the separation of the force category for the functioning systems based on the category of efficiency of any process as a category, which at the same time characterizing the process from both the quantitative side (in the form of its scale product) and the qualitative one (taking into account the effectiveness of the process). Authors of the works [2,3,26,37–43] etc. prove of more twenty years on concrete examples for systems of various levels and types what this is sense to do just so, and to use a set of interrelated authoritative rates (models) for estimation of force of the system functioning process; this in a way innovates approaches to cognition of the system functioning process. Dynamics of novelty for research results to solve the problems associated with evaluating of processes effectiveness was as follows: first studied the basic process (the process of labor [41], 1996), then researched any economic process (1998), and then studied any process ([43], 2012). We are still investigating various processes in this context. An example of continued research from this perspective is the article, which presented. With respect to the energy aspect of the results of the study, we found the following: since the values of rates for the energy of general products, products as costs (losses) and net products (products in the form of benefits, benefits) of the subprocesses of the systems functioning processes are equal respectively to the values of the rates for these products (this was proved by us in the publication [40] p. 118), it can be assumed that the study of certain processes based on the rates of these products means their scientific consideration in the energy aspect. Modeling the effectiveness of the energy conversion process can be done on the basis of the output-input ratio (OIR) of the process. This factor is known to always be less than one hundred percent, since without loss conversion of energy is impossible in principle. In the publication [39] p. 263, we define OIR as the ratio of the net product rate of the process to the rate of its total product. The specified coefficient describes the effectiveness mainly in terms of benefit. Previously, in publications, we described the effectiveness of the process using the effectiveness rate in the classical sense (as the ratio of the rate of the total product of the process to the rate of its costs); this rate determines the features of effectiveness with standpoint of cost. The effectiveness level of the energy conversion process in turn can be characterized by the effectiveness rate in the classical sense.
286
V. Yarmolenko et al.
Therefore, OIR is a characteristic of process effectiveness in terms of benefit, and effectiveness rate in the classical sense is a characteristic of process effectiveness in terms of cost. In the article [38] we implemented the following idea that emerged from us: on the basis of modeling, to characterize the process effectiveness at the same time, both from the point of view of profit and from the point of view of cost; this determined the purpose of the study of this article. In the article [38] we have formed other new characteristics of the efficiency for the process of the functioning system: rates of the efficiency of the process in terms of both benefits and benefits and costs. They were naturally attached to the cost efficiency rate that we had proposed before. The article [37] presents the method of measuring the effectiveness of the processes of functioning of the system components based on models of the components of efficiency in terms of energy approach, developed by use.
4
Materials and Methods
The presented article is a continuation of our work [37–40], being essentially an overview of the theoretical results of these articles. In addition, the application aspects of specific industrial enterprises are presented in these examples. The process of functioning of the system, in our view, is a set of actions of the system in space and time under certain internal and external conditions (circumstances) under the influence of any factors (driving forces). The process of functioning of the system, in turn, is a collection of certain subprocesses (component, component processes). Theoretical and practical significance and novelty of the article are that it presents advanced approaches to the theory and practice of measuring the effectiveness of the processes of functioning of system components on the basis of models of components of efficiency using the share of benefits and costs of processes in their total products. In the article (as always, if necessary, in our works) we will use models of components of the effectiveness of any process and the corresponding rates as indicators of process force [3,43] pp. 48–50, where the basis of models, as we believed and believe, is that the consequence of any process is its products: as a benefit; as a costs; a total product in the form of a product as a benefit and a product as a costs; scale product in the form of product as a benefit and of that part of the product as a costs, which is proportional to the product share as a benefit in the total product. Rates as indicators of process force are as follows: V is rate of the total product of the process; Z is rate of its product as a charge; G = (V − Z) is rate of the product as a benefit of the process; E = V /Z is rate of the process effectiveness as a ratio of rate of total product V and of rate product as cost Z (qualitative component of process efficiency rate); K = (G + Z × G/V ) is rate of process product scale (quantitative component of process efficiency rate); R = K × E = KxV /Z = G × V × (1 + V /Z) is rate of process efficiency as the product of the rate K of process product scale to rate E of the process effectiveness. This was reflected in our publications [2,3,26,37–43].
Analysis of Effectiveness Components for the System: Energy Aspect
287
In [39] p. 262 we are reminded that information bases for calculating efficiency rates, for example, at the micro level, are the annual financial statements of enterprises; these figures should be taken into account in monetary terms at actual prices per employee. In other cases, a certain information base is used, given that the practical application of the proposed approaches to the study of a particular process based on simulation depends on the specifics of this process and requires special consideration, which is related to the peculiarities of measuring the products of the process [3,43] pp. 48–50. We also recalled the validity of our hypothesis about the existence of a efficiency reaction (a reaction to the appropriate type of communication: social, economic, environmental, technological, organizational, etc.), which contributes to a certain level of efficiency, the evaluation of which is related to energy consumption, relative the process of system functioning requires simultaneous consideration of both quantitative and qualitative components of efficiency [39] p. 262. We use the foregoing to develop and disclose a method for measuring the effectiveness of the processes of functioning of the system components based on the author’s models of the components of efficiency, using the benefits and costs of the processes in their total products. We have emphasized above that energy is a generalized characteristic of matter motion. Energy is produced, transmitted, converted, and its quantity measured. The consequence (result) of the energy of the process of functioning of the system is its products, so we used the opportunity to measure the efficiency and the related factor of the process by its products. We did this on the basis of the authoritative indicators of the performance components for the purpose of finding risks, reserves and incentives for further development of the system, and used to improve and disclose the method of measuring the efficiency of the subprocesses of the system functioning while taking into account the costs and benefits. In the article [39] p. 263 we proposed to calculate output-input ratio of the process (η), using formula (1) as the ratio of the benefit rate product of the process G to the total rate product of the process V : η = G/V
(1)
In the publication [38] p. 181 we emphasized that this coefficient describes effectiveness mainly in terms of benefit (benefit), and that the effectiveness of the process energy conversion can also be described by the effectiveness rate in the classical sense (in the form of the ratio of the rate for the total product of the process to the rate of its cost), and this rate identifies features of effectiveness in the form of dependency (2): E = V /Z
(2)
On the basis of modeling in [38] p. 181 the characteristic of process effectiveness has been formed from the standpoint of both benefits and costs in the form of the geometric middling for product of the effectiveness rate E = V /Z and the rate η = G/V , that is:
288
V. Yarmolenko et al.
E1 =
E×η =
V /Z × G/V =
G/Z.
(3)
Therefore, this rate is equal to the square root of the quantitative component G/Z of the effectiveness rate E [38] p. 181. In the same article, we considered the practical use of the obtained results relative to measuring the new process effectiveness rate, along with the other two author performance metrics (E = V /Z; η = G/V ) and the three performance metrics we introduced (cost, benefit and cost, benefit) as an example of the implementation process capital investments and running costs for the protection and rehabilitation of soil, groundwater and surface water in Ukraine in the five of studied years [38] p. 181–182. These efficiency rates are as follows: R = K × E, R1 = K × E1 , R2 = K × η.
(4)
The new effectiveness rates of the process, given in [38], along with other metrics we used in [37] to reveal a method for measuring effectiveness of the system component processes based on efficiency component models using of the benefit and cost shares of the processes in their total products. Significant in this method is the following: if zi = Zi /Vi is the share of the product as a cost, and gi = Gi /Vi is the share of the product as a benefit in the total product, then: zi + gi = 1 (zi < 1, gi < 1), Vi = 1 (i = 1, 2, ..., n). The appropriate formulas for calculating effectiveness rates are as follows: (5) gi = Gi /Vi = ηi ; Ei = 1/zi ; Ei1 = gi /zi (i = 1, 2, ..., n), where zi = 1 − gi . The innovation of the research results presented in article [37] is that the approaches, proposed by its authors, solve the problem of simultaneously measuring for the effectiveness of the processes of functioning of the system components (by means of effectiveness rates of three types: ηi , Ei , E1i ) irrespective of the units of measurement of their common, pure products and cost, because it comes down to a dimensionless unit of measure in terms of the benefit and cost shares of the processes in their total products, and the mean values of effectiveness rates for processes of functioning of the system components to a certain extent (approximately) can be considered characteristics of the appropriate effectiveness rates for the process functioning of the system. Using the methods, invented by then, the authors of Article [37], applying two conditional examples, related to the components of the country’s economic energy system, showed how the issues of the characterization of the effectiveness of current economic energy of the country (in particular concerning of the characteristics of the OIR) can be addressed on the basis of authoring models for efficiency components using of the benefit and cost shares of the processes in their total products.
Analysis of Effectiveness Components for the System: Energy Aspect
289
We consider the practical implementation of this methodology on specific examples of industrial enterprises. This technique is somewhat different from the technique implemented in the examples given in [36,37], refining it.
5
Experiment, Results and Discussion
For an example of an object of study, we choose the process of production activity of the enterprise, breaking it into three components: the creation of gross income, the formation of financial results before tax, the creation of a net financial result. It is known that the process of formation of gross income is a labor process and forms part of the creation of gross value added (GVA) of a region, country. It is characterized by material and depreciation costs. The process of generating results from operating activities depends on the cost of sales, other operating income, administrative expenses, sales costs, and other operating expenses. The process of formation of financial results before tax depends on the formation of results from operating activities, that is, the cost of sales, other operating income, administrative expenses, sales expenses, other operating expenses, as well as other financial and other income, financial and other expenses. The process of generating net income is significantly affected by the income tax on ordinary activities. Having identified the object of the study, we will consider the above process of four commensurate leading mechanical engineering enterprises (A), (B), (D), (E). Due to the confidentiality of information under Article 21 of the Law of Ukraine “On State Statistics”, we do not provide specific statistics for each of these enterprises. Nevertheless, it should be noted that the following companies are: Private Joint-Stock Company “Vinnytsia Research Plant”, Private Joint-Stock Company “Kalynivskyi Machine-Building Plant”, Private Joint-Stock Company “Bar Machine-Building Plant”, Private Joint-Stock Company “Vinnytsia Plant “MAYAK””. Privacy is ensured by the draw concerning these businesses. The Table 1 shows the annual values of rates to production activity of these enterprises for 2014–2018 and their average values for this period, which authors are calculated as the arithmetic averages. After Table 1 we have indicated Table 2 and Table 3. The Table 2 presents the values of certain rates of production activity of enterprises A, B, D, E and the average values of rates for these enterprises (PES) for the period 2014–2018, which the authors are calculated according to the Table 1 as arithmetic means. The Table 3 shows the average annual of the effectiveness component rates for the production activity of enterprises A, B, D, E, and their average values (WES) for period 2014–2018. We calculated these rates in accord with Table 2.
290
V. Yarmolenko et al. Table 1. Rates of production activity of enterprises for 2014–2018
Rates
Years
Average
2014
2015
2016
2017
2018
2
3
4
5
6
7
1. Net income (revenue) from sales of products, thousand UAH
16579
17407
25096
44069
44505
29531,2
2. Material costs and depreciation, thousand UAH
11861
11653
22424
24842
33422
20840,4
3. Average annual number of employees, persons
90
80
83
77
78
81,6
4. Gross income, thousand UAH
–
–
–
–
–
8690,8
5. Financial results before taxation thousand UAH (profit, loss)
5204
4134
2333
8112
11048
6166,2
6. Net financial result thousand UAH (profit, loss)
4701
3384
1895
6641
9040
5132,2
1. Net income (revenue) from sales of products, thousand UAH
49586
96786
74478
164767
186037
114330,8
2. Material costs and depreciation, thousand UAH
28259
51563
53385
90664
149373
74648,8
3. Average annual number of employees, persons
385
344
368
383
396
375,2
4. Gross income, thousand UAH
–
–
–
–
–
39682
5. Financial results before taxation thousand UAH (profit, loss)
5347
17747
Loss 6090
12653
12391
8409,6
6. Net financial result thousand UAH (profit, loss)
4940
14553
Loss 6672
10406
10161
6677,6
1. Net income (revenue) from sales of products, thousand UAH
96084
120475
128701
141782
157979
129004,2
2. Material costs and depreciation, thousand UAH
63461
87082
88460
108417
122811
94046,2
3. Average annual number of employees, persons
427
420
389
354
320
382
4. Gross income, thousand UAH
–
–
–
–
–
34958
5. Financial results before taxation thousand UAH (profit, loss)
4872
8450
6975
7565
7144
7001,2
6. Net financial result thousand UAH (profit, loss)
3695
6839
5266
6404
5951
5631
1. Net income (revenue) from sales of products, thousand UAH
186400
244270
296312
311874
382858
284342,8
2. Material costs and depreciation, thousand UAH
102944
172202
200542
243811
279980
199895,8
3. Average annual number of employees, persons
781
790
794
705
582
730,4
4. Gross income, thousand UAH
–
–
–
–
–
84447
5. Financial results before taxation thousand UAH (profit, loss)
32234
28230
18936
9929
Loss 1581
17549,6
6. Net financial result thousand UAH (profit, loss)
27144
22586
12239
4818
Loss 5063
12344,8
1 Enterprise (A)
Enterprise (B)
Enterprise (D)
Enterprise (E)
Analysis of Effectiveness Components for the System: Energy Aspect
291
Table 2. Average annual values of production rates of enterprises (A), (B), (D), (E) and average values of rates for these enterprises (WES) for the period 2014–2018 Rates 1
Enterprises (A) (B) 2
WES (D)
(E)
3
4
5
6
1. Net income from sales 29531,2 of products (V1i )
114330,8
129004,2
284342,8
139302,25
2. Material costs and depreciation (Z1i ), thousand UAH
20840,4
74648,8
94046,2
199895,8
97357,8
3. Average annual number of employees, persons
81,6
375,2
382
730,4
392,3
4. Gross income (G1i , (V2i ), thousand UAH
8690,8
39682
34958
84447
41944,45
5. Financial results before taxation (G2i , (V3i ), thousand UAH (profit, loss)
6166,2
8409,6
7001,2
17549,6
9781,65
6. Net financial result (G3i ), thousand UAH (profit, loss)
5132,2
6677,6
5631
12344,8
7446,4
7. Fraction 0,294292 g1i = G1i /V1i = η1i part 4 of point 1
0,347081
0,270983
0,296990
0,3023365
8. Fraction 0,709509 g2i = G2i /V2i = η2i part 5 of point 4
0,211925
0,200275
0,207818
0,33238175
9. Fraction 0,8323116 0,7940449 0,8047192 0,7034234 0,78362479 g3i = G3i /V3i = η3i part 6 of point 5 10. Fraction z1i = Z1i /V1i = 1 − g1i part 4 of point 1
0,705708
0,652919
0,729017
0,70301
0,6976635
11. Fraction z2i = Z2i /V2i = 1 − g2i part 5 of point 4
0,290491
0,788075
0,799725
0,792182
0,66761825
12. Fraction z3i = Z3i /V3i = 1 − g3i part 6 of point 5
0,1676884 0,2059551 0,1952808 0,2965766 0,21637522
292
V. Yarmolenko et al.
Table 3. Average annual values of the effectiveness component rates for the production activity of the enterprises A, B, D, E, and their average values (WES) for period 2014– 2018 Rates
Enterprises
1
WES
(A)
(B)
(D)
(E)
2
3
4
5
6 0,3023365
Gross Revenue Generation Process (component 1) g1i = G1i /V1i = η1i 0,294292
0,347081
0,270983
0,296990
(3)
(1)
(4)
(2)
z1i = 1 − g1i
0,705708
0,652919
0,729017
0,703010
0,6976635
E1i = 1/z1i
1,417017
1,531583
1,371711
1,422455
1,502775
(3)
(1)
(4)
(2)
1 E1i =
g1i /z1i
0,645768
0,729097
0,609681
0,649965
0,709089
(3)
(1)
(4)
(2)
0,835263
The process of generating financial results before tax (component 2) g2i = G2i /V2i = η2i 0,709509
0,211925
0,200275
0,207818
(1)
(2)
(4)
(3)
z2i = 1 − g2i
0,290491
0,788075
0,799725
0,792182
E2i = 1/z2i
3,442447
1,268915
1,250430
1,262336
1,806032
(1)
(2)
(4)
(3)
1,497862
1 E2i =
g2i /z2i
0,33238175 0,66761825
1,562833
0,518570
0,500430
0,512188
0,773505
(1)
(2)
(4)
(3)
0,705593
The process of generating a net financial result (component 3) g3i = G3i /V3i = η3i 0,8323116 0,7940449 0,8047192 0,7034234 0,78362478 (1)
(3)
(2)
(4)
z3i = 1 − g3i
0,1676884 0,2059551 0,1952808 0,2965765 0,21637522
E3i = 1/z3i
5,963442
4,855427
5,120831
3,371811
4,82787875
(1)
(3)
(2)
(4)
4,62160142
1 E3i =
g3i /z3i
2,227878
1,963524
2,029983
1,540068
1,94036325
(1)
(3)
(2)
(4)
1,90305056
Mean values (arithmetic mean) gaverage
0,6120375 0,451017
0,4253257 0,4027438 0,4727810
(1)
(3)
(2)
(4)
zaverage
0,3879625 0,548983
Eaverage
3,607635
2,551975
2,580991
2,018867
2,540744
(1)
(3)
(2)
(4)
1,896745
1 Eaverage
0,5746743 0,5972562 0,5272189
1,478826
1,070397
1,046698
0,900740
1,14098575
(1)
(2)
(3)
(4)
0,946966
Performance metrics calculated from gaverage and zaverage E = 1/zav E1 =
gav /zav
2,577569
1,821550
1,740116
1,674323
(1)
(2)
(3)
(4)
1,256013
0,906394
0,860300
0,820485
1,896745 0,946966
(1) (2) (3) (4) * The second row of the row with respect to effectiveness rates of column 6 indicates their values, which are calculated by the corresponding formulas of column 1, and the first row indicates arithmetic mean values of the enterprise rates. The parentheses indicate the rankings of the rates. Source: Calculated by the authors according to Table 2.
Analysis of Effectiveness Components for the System: Energy Aspect
293
With regard to the content of the components, which are shown in Table 3, note the following: – the process of generating gross income (component 1) is reflected in the enterprise in monetary terms for the value of newly created products. Gross income is defined as the difference between revenue and material costs and depreciation and amortization in the cost of sales; – the process of formation of financial results before tax (component 2) is the process of determining the algebraic amount of profit (loss) from operating activities (i.e., the main activity of the enterprise, which is associated with the production and sale of products (works, services), provides the bulk of income and is the main purpose of creating an enterprise) financial and other income (profits), financial and other expenses (losses); – the process of creating a net financial result (component 3) is the formation of the enterprise’s net profit (loss) and is calculated as the algebraic amount of profit (loss) before tax, income tax and profit (loss) on discontinued operations after tax. The process of generating financial results can be seen as part of the overall economic system, which is associated with management decisions to ensure that they are sized at the enterprise level to achieve strategic, tactical and operational goals. The data obtained in Table 3 give an opportunity to characterize the production activity of enterprises in terms of its efficiency as a whole by enterprises and by its components. For example, it can be argued that enterprise A managed more efficiently than other enterprises through the process of generating financial results before tax (component 2) and the process of creating a net financial result (component 3), although it performed worse than other enterprises in terms of the gross margin creation process income (component 1). The level of efficiency of the enterprise functioning process can be arranged as: A, B, D and E (from higher level to lower). By the levels of efficiency of the processes of functioning of the components of the enterprise can be arranged as (from the highest level to the lower): B, E, A, D (relative to component 1); A, B, E, and D (relative to component 2); A, D, B, E (about component 3). On average, enterprise A was the best-managed enterprise, and enterprise E had the worst economic performance. Studying the causes and consequences of operating a business requires further exploration. It is planned to do this with the help of the authoring tool SEE-analysis and SEE-management.
6
Conclusions
We in the article made a short excursion into the latest author’s approaches to the measuring method of the effectiveness for the processes of functioning of the system components, which were presented in the articles previously published by the authors; examples of specific industrial enterprises show its application. This technique is somewhat different from the technique implemented in the examples given in [37], refining it. The following is revealed: since the values of the energy rates of the total products, products as costs and net products of the subprocesses
294
V. Yarmolenko et al.
of the system functioning process are equal respectively to the values of the rates of these products (this was proved by us in the publication [40] p. 118), then the study of certain processes (in particular the processes of functioning of the components systems) on the basis of the rates of these products means their scientific consideration in the energy aspect. Rates of force for systems, showed in [3,37–39], together with the updated approaches to measuring of the effectiveness rates implemented in the presented article, we used to uncover innovative methods of measuring the effectiveness of the processes of functioning of system components based on models of components of efficiency with the use of the benefit and cost shares of the processes in their total products. The example of four industrial enterprises of Vinnytsia region shows the practical implementation of this methodology. An innovation of the results of the study is that the approaches, proposed by the authors, solve the problem of simultaneously measuring the effectiveness of the processes of functioning of the system components (using effectiveness rates of three types: ηi , Ei , E1i ) independently of the units of measurement of their total, pure products and costs, since it all comes down to a dimensionless unit of measurement of the benefits and costs the share of the processes in their total products, and the average values of the effectiveness rates of the processes of functioning of the system components to some extent can be considered as the characteristics of the respective effectiveness rates for the process of the functioning of the itself system. Further studies are intended to address the role played by the measurement of newly created metrics for the functioning of systems in author’s SEE-analysis and SEE-management.
References 1. Aslund, A.: The Last Shall Be the First: The East European Financial Crisis, p. 136. Peterson Institute for International Economics, Washington (2009) 2. Burennikova, N., Yarmolenko, V.: See-analysis of the effectiveness of the processes of protection and rehabilitation of soil, groundwater and surface waters of Ukraine. All-Ukrainian Sci. Prod. J. Econ. Financ. Manage. 11, 69–79 (2017). Topical Issues of Science and Practice 3. Buriennikova, N., Yarmolenko, V.: Efficiency of Functioning of Complex Economic Systems of Agrarian Direction, p. 168. VNAU, Vinnitsa (2017) 4. Demyanenko, S.: To the question of value theory. Curr. Prob. Econ. 2, 16–21 (2011) 5. Fedulova, L.: Management of Organizations, p. 448. Swan, Kyiv (2004) 6. Georgescu-Roegen, N.: The Entropy Law and the Economic Process, p. 457. Harvard University Press, Cambridge (1971) 7. Georgescu-Roegen, N.: Prospects for growth: expectations for the future. Matter matters, too. In: Wilson, K.D. (ed.) Praeger, New York, pp. 293–313 (1977) 8. Klymash, N.: Scientific and theoretical aspects of the essence of concepts of “efficiency” and “effectiveness.” Sci. Mem. NUKhT 28, 124–125 (2009) 9. Kolotylo, D.: Ecology and Economics, p. 368. KNEU, Kyiv (1999) 10. Kornai, J.: The Road to a Free Economy. Shifting from a Socialist System. The Example of Hungary, p. 224, W.W. Norton, New York (1990) 11. Korniychuk, L., Shevchuk, V.: Sustainable development and global mission of Ukraine (beginning). Ukraine Econ. 4, 4–13 (2009)
Analysis of Effectiveness Components for the System: Energy Aspect
295
12. Korniychuk, L., Shevchuk, V.: Sustainable development and global mission of Ukraine (ending). Ukraine Econ. 5, 4–14 (2009) 13. LaRouche, L.: The science of physical economy as the platonic epistemological basis for all branches of human knowledge. Execut. Intell. Rev. 21(9–11) (1994) 14. Liamets, V., Teviashev, A.: System Analysis. Introductory Course, p. 448. KhNURE, Kharkiv (2004) 15. Libanova, E.: Social results of the state programs: theoretical- methodological and practical aspects you evaluation [Text]: monograph, p. 312. Sochinsky, Uman (2012) 16. Lupenko, Y., Zhuk, V., Shevchuk, V., Khodakivska, O.: Physical economy in the measurement of theory and practice of management, p. 502. K.: SIC IAE (2013) 17. Lytvynenko, V., Savina, N., Voronenko, M., Doroschuk, N., Smailova, S., Boskin, O., Kravchenko, T.: Development, validation and testing of the bayesian network of educational institutions financing. In: The Crossing Point of Intelligent Data Acquisition and Advanced Computing Systems and East and West Scientists (IDAACS2019), 18–21 September, Metz, France, pp. 412–418 (2019). https://doi.org/10. 1109/IDAACS.2019.8924307 18. Lytvynenko, V., Savina, N., Voronenko, M., Pashnina, A., Baranenko, R., Krugla, N., Lopushynskyi, I.: Development of the dynamic Bayesian network to evaluate the national law enforcement agencies’ work. In: The Crossing Point of Intelligent Data Acquisition and Advanced Computing Systems and East and West Scientists (IDAACS-2019), 18–21 September, Metz, France, pp. 418–424 (2019). https://doi. org/10.1109/IDAACS.2019.8924346 19. Melnik, l.: Fundamentals of Development, p. 288. University book, Sumy (2003) 20. Mochernyi, S.: Economic Theory, p. 656. Akademiia (Alma-mater), KYIV (2003) 21. Murzenko, O., et al.: Application of a combined approach for predicting a peptideprotein binding affinity using regulatory regression methods with advance reduction of features. In: The Crossing Point of Intelligent Data Acquisition and Advanced Computing Systems and East and West Scientists (IDAACS-2019), 18–21 September, Metz, France, pp. 431–436 (2019). https://doi.org/10.1109/ IDAACS.2019.8924244 22. North, D.: Understanding the Process of Economic Change North, p. 2208. Princeton University Press, Princeton (2005) 23. Odum, H.: Environment, Power and Society, p. 331. Wiley-Interscience, New York (1971) 24. Oleksiuk, O.: Economics of Efficiency, p. 362. KNEU, Kyiv (2008) 25. Podolynskyi, S.: Human labor and its attitude to distribution of energy. Word 4–5, 135–211 (1880) 26. Polishchuk, N., Yarmolenko, V.: The genesis of the author’s approaches to solving the problem of evaluating the efficiency of functioning of complex systems with the help of performance components. In: The Book: Economics of the 21st Century: Problems and Solutions: Monograph, pp. 359–369. For the title. ed. Doroshenko, MS Pashkevich. Dnepropetrovsk: NSU (2014) 27. Prigogine, I., Stengers, I.: Order Out of Chaos: Man’s New Dialogue with Nature, p. 349. Bantam Books, New York (1984) 28. Rudenko, M.: The Energy of Progress, p. 412. Friendship, Ternopil (2005) 29. Shevchuk, V.: Physical and economic understanding of the mission of Ukraine. In: Yu, P.L. (ed.) Physical Economy in the Dimensions of Management Theory and Practice: A Collective Monograph, pp. 445–449. K.: NSC ”Institute of Agrarian Economy” (2013)
296
V. Yarmolenko et al.
30. Soddy, F.: Matter and Energy, p. 225. Williams and Norgate, London (1911) 31. Tishchenko, A., Kizim, N., Dogadaylo, Y.: Economic Performance of the Enterprise, p. 44. Kharkiv: ID ”INZhEK” (2003) 32. Vassilenko, V., Valtchev, S., Teixeira, J., Pavlov, S.: Energy harvesting: an interesting topic for education programs in engineering specialities. In: Internet, Education, Science (IES-2016), pp. 149–156 (2016) 33. Veynik, A.: Thermodynamics of Real Processes, p. 576. Science and Engineering, Minsk (1991) 34. Vovk, V.: Mathematical Methods of Operations Research in Economic and Production Systems: Monograph, p. 584. Publishing Centre of Ivan Franko LNU, Lviv (2007) 35. Yagelskaya, K.: Estimation of national economic development on the basis of energy approach. Collect. Sci. Works Donetsk State 299, 23–30 (2016) 36. Yarmolenko, V., Burennikova, N., Akselrod, R.: See-management by the force of the process functioning system based on the output-input ratio: The energy aspect. In: Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2020. Advances in Intelligent Systems and Computing, vol. 1246, pp. 697–714. Springer, Cham (2021). http://link-springer-com-443.webvpn.fjmu.edu.cn/chapter/10.1007 %2F978-3-030-54215-3 45 37. Yarmolenko, V., Burennikova, N.: Measuring efficiency of processes of functioning of system components on the basis of models of components of efficiency: energy aspect. Bus. Inform 12, 102–110 (2019). https://doi.org/10.32983/22224459-2019-12-102-110 38. Yarmolenko, V., Burennikova, N.: Measuring the efficiency of the process of functioning of the system while taking into account its effectiveness in the classical sense and the efficiency coefficient: the energy aspect. Prob. Econ. 3(41), 179–185 (2019). https://doi.org/10.32983/2222-0712-2019-3-178-185 39. Yarmolenko, V., Burennikova, N.: The practice of measuring the efficiency coefficient of the process of system operation based on indicators of efficiency components. Probl. Econ. 3(37), 260–266 (2018) 40. Yarmolenko, V., Buryennikova, N.: The practice of measuring the energies of products of the process of system functioning on the basis of performance components. Bus. Inform 7, 115–121 (2018) 41. Yarmolenko, V., Polishchuk, N.: Measurement of the labor efficiency on the basis of the money rates. Storing Process. Agricult. Mater. 2, 10–12 (1996) 42. Yarmolenko, V., Polishchuk, N.: Components of the effectiveness of the process of professional orientation of students learning as objects of modeling: practical aspect. In: Modern Information Technologies and Innovative Teaching Methods in the Training of Specialists: Methodology, Theory, Experience, Problems 27(KyivVinnytsia: Planer LLC.), 547–553 (2011) 43. Yarmolenko, V., Polishchuk, N.: Components of the efficiency of the functioning of complex systems as objects of modeling. Herald Cherkassy Univ. Ser. Econ. Sci. 33(246), 86–93 (2012) 44. Zahorna, T.: Economic Diagnostics, p. 440. Centre of Educational literature, Kyiv (2007) 45. Zinchenko, V.: The concept of physical social economy and the paradigm of global development: foreign models and the Ukrainian context. Curr. Probl. Econ. 10, 23–30 (2011)
Method of Mathematical and Geoinformation Models Integration Based On Unification of the Ecological Data Formalization Oleg Mashkov1 , Taras Ivashchenko1 , Waldemar W´ ojcik2 , 3 3(B) Yuriy Bardachov , and Viktor Kozel 1
State Ecological Academy of Postgraduate Education and Management, Kyiv, Ukraine {mashkov oleg 52,emma.dea}@ukr.net 2 Lublin University of Technology, Lublin, Poland [email protected] 3 Kherson National Technical University, Kherson, Ukraine [email protected], k [email protected]
Abstract. The scientific and methodological basis for the integration of mathematical and geoinformation models based on the unification of the formalization of environmental data has been developed. The ways of integration of the theoretical base of mathematical modeling from information sources about the state of the environment and the processes of its pollution are determined. An algorithm for selecting a modeling object, an algorithm for selecting the rules of recalculation of coordinates, and positioning of the calculation grid on the object are proposed. Systematization and formalization of the main components, quantities, and variables of mathematical models of processes, geoinformation models of systems, and relational models of databases are carried out. The model of relational databases and analogs of methods of data representation in mathematical and geoinformation models are offered. The stages of solving the problem of automating the exchange of data between models of different types in several stages are proposed. Variants of setting problems of automated synthesis of models of different types, depending on the identity of the structure and parameters of these models are considered. The stages of the automated creation of a database management system are determined. The approbation of the developed software is considered on a model example. Keywords: Geoinformation model · Information modeling Relational database · Ecology · Automated control system
1
·
Introduction and Literature Review
As you know, ensuring the excellent state of the environment is a very important and important issue today. But making optimal decisions to improve it or c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 297–313, 2022. https://doi.org/10.1007/978-3-030-82014-5_20
298
O. Mashkov et al.
prevent negative changes in accordance with the principles of sustainable development is impossible without comprehensive monitoring of all components of geoecosystems and control of the main sources of pollution. World experience has shown that to improve the quality, efficiency, complexity, and efficiency of the environmental monitoring system it is necessary to combine modern innovative tools and technologies: automated and automatic measuring systems; aerospace research using satellites, aircraft, and unmanned aerial vehicles; automated remote sensing data processing systems; geoinformation analytical systems for information processing, taking into account the patterns of its change in time and space; integrated multi-level systems of monitoring and control of the state of the environment, which will provide integration and comprehensive analysis of data on the state of all components of the environment, both individual regions and the country as a whole with the ability to exchange data with similar international monitoring systems; methods and technologies of analysis of environmental monitoring data and determination of the level of technogenic and ecological safety, etc. Basic aspects of environmental monitoring systems were considered in the following works [3,9,11,13–15,17]. In [15], a computer information system (CBIS) is considered, which provides an opportunity to solve the problem of choosing efficient groundwater using long-term hydrogeological conditions. The model of configuration of the information system of hydrogeological conditions of Vinnytsia region is presented. Geomorphological mapping using the SRTM model along the route of the future gas pipeline is considered. The geoinformation model is based on a number of density grids, which account for the distribution of parameters that characterize the paleoenvironment of the condition in a certain radius. Cross-analysis of these networks has made it possible to identify areas with optimal conditions for Paleolithic humans [17]. In [11], a comparison of the prediction results of three different approaches was performed: decision tree (DT), support machine vector (SVM), and adaptive fuzzy inference system (ANFIS). According to the results of model performance, it was found that the developed models are quite satisfactory, i.e. the zones identified on the map are zones of relative susceptibility. In [9] structuring by means of the theory of geoinformation is considered. Some important structural aspects of geoinformation are considered, such as the relationship between the features of the area, their thematic attributes, and geometry. The structure of feature classification systems and various solutions for linking thematic data to geometric data are presented. In addition, the discussion of the general characteristics of information systems leads to the identification of the information base, the information processor.
2
Problem Statement
Development of scientific bases, creation, and implementation of such systems, methods, and technologies correspond to the European and world approaches to
Method of Mathematical and Geoinformation Models Integration
299
ecological management. The results of this study will significantly expand the opportunities for international cooperation of Ukraine in the field of environmental protection and will help bring the state of the environment in line with European and world requirements [2,5]. The current level of achievements in the theory of mathematical modeling, mathematical physics, and control theory in real natural ecosystems is so significant that it makes it possible to model and predict almost any process in them. To work with such models, special mathematical packages such as Matlab, Maple, Mathcad, Mathematica, Statistica, etc. are used. or researchers develop their own programs. At the same time, environmental monitoring systems and cadastres of natural resources both in the world and in Ukraine are being created as geographic information systems (GIS). To work with such systems, special universal packages for working with GIS ArcGIS, Mapinfo, GIS “Panorama”, Digitals, GeoDraw, etc. are used. or developers create their own software using the tools of these GIS packages. At the same time, the built-in means of visualization of results of calculations of special mathematical packages considerably concede to possibilities of GIS-packages, and the built-in toolkit of mathematical calculations of GIS-packages considerably concedes to possibilities of special mathematical packages. For use in the identification of a mathematical model of GIS data, as a rule, either manually sample the data, which is then connected as an input to Matlab, Maple, Mathcad, etc., or develop their program on Delphi, Visual C++, VB, etc., on based on GIS tools that implement mathematical algorithms and work with GIS data. Each new model, each new GIS is a separate approach, separate programs, additional time. There are no programs for automated GIS synthesis by mathematical models at all, as well as the theoretical basis of this process. There is only a UML language (Unified Modeling Language) for a formalized description of models of complex software and information systems, including GIS, but there are no clear guidelines for writing mathematical models in UML notation [1,2,5,12,16]. Thus, there is a separation of the theoretical basis of mathematical modeling from information sources on the state of the environment and its pollution processes, which significantly complicates and inhibits the use of mathematical models to solve current problems of modeling natural processes and the process of intelligent data processing of geographic information systems. Advantages of geoinformation technologies for visualization of modeling results. The aim of the work is to develop a method of integrating mathematical and geoinformation models based on the unification and formalization of environmental data.
3 3.1
Materials and Methods Algorithm for Selecting a Modeling Object
The essence of the proposed method is not simply to derive the results of mathematical models on a GIS map but to automatically generate spatial and attribu-
300
O. Mashkov et al.
tive GIS data that correspond to the simulation results, providing the ability to apply to them the capabilities of GIS technologies. It is known that each mathematical model, such as a differential equation, must have its own equation and initial (or boundary) conditions. To apply it in practice, find the solution of the equations of the model, i.e. get a mathematical expression for its calculation, as well as set certain numerical values of the model parameters, steps, and intervals of calculation. To implement the proposed technology, it is also necessary to specify the following characteristics and rules: – the spatial object for which the simulation is performed; – restrictions on the spatial object that will fit this model; – rules for recalculation of coordinates and positioning of the calculation grid on the object taking into account its possible curvature. To ensure the automation of the processing of these characteristics and rules, it is proposed to use a specially created program. First of all, it indicates the GIS map (.map or .sit file) that is being worked on. The program automatically determines the map classifier and imports its parameters and content. The analysis showed that the most universal and most common in the world is the mathematical package MS Excel. The program that implements the proposed algorithms is an add-on to the MS Excel file, and in the same file provides formalization, input and collection of all data [6,7]. The modeling object is selected according to the following algorithm: 1. The MS Excel file displays a list of all layers of the classifier in the form: Also indicate the types of symbols used for this object on the map (P (point) point, L (line) - linear, S (square) - planar). For example, large rivers may be referred to as planar objects and small or medium rivers as linear. 2. According to each object is assigned the designation: River - is A1, Pond A3, etc. 3. In the second sheet of the Excel document, the user specifies which variable applies to which object and which type. Restrictions are set as follows. The formalized description describes all the requirements for the object, the implementation of which ensures the correctness of the model results. The following characteristics should be used: – spatial relationships between objects using typical operations with sets: (∩, ∪, ⊃, ⊇, , ⊂, ⊆, , ∈, ∈, / etc.), meaning the relationship between the sets of coordinates of the corresponding objects, for example “A4 ⊂ A2 ” means that only those monitoring objects whose coordinates coincide with the coordinates of the rivers are considered; – spatial functions (their set is fixed, but it is allowed to enter expressions based on them, such as restrictions on the meandering of the river in the form of Lf /Lp ≤ 2): • for point: height H (above sea level);
Method of Mathematical and Geoinformation Models Integration
301
• for linear: average height H (above sea level), height of the beginning of the line H1 (above sea level), height of the end of the line H2 (above sea level), length Lf along the line, length along the line Lp between the beginning and end of the line; • for planar: average height H (above sea level), average height of perimeter points Hp (above sea level), length of perimeter L; area S; – values of attribute parameters from the internal database: B1, B2, B3, . . . parameter names for each simulation object in column B; – values of attribute parameters from an external database, for example, MS Access database: C1, C2, C3, . . . - parameter names for each simulation object in column C. The third, fourth, and other sheets of the Excel document are presented as templates, where the user can conveniently set all the necessary restrictions on the characteristics of the simulation object. 3.2
Development of an Algorithm for Selecting the Rules of Coordinate Recalculation
The choice of rules for recalculation of coordinates, and positioning of the calculation grid on the object is made according to the following algorithm: – the correspondence between the spatial units of the variables of the equation, and the real coordinates (in meters) of the GIS map is established - the multiplier Kx, Ky, Kz, and the coordinates of the reference point X0, Y 0, Z0 are set at least on the first two (or all three) spatial axes: X (required), Y (required), Z (optional); – if the modeling object is planar, and curvilinear, and the simulation results will be displayed along one of the axes on it, for example, on a reservoir or along a large river in the direction of its flow, then you should specify the method of superimposing axes on this object (selected from a fixed set); – average line, for example for a river or reservoir, is an average line that is equally distant from the left, and right banks at the same time; – along with a linear object, for example for a river, it is a linear object “fairway”, which is specially applied in advance; – perpendicular to the average line, for example for a river or reservoir, is a line that is perpendicular to the average, and its length is limited by the left, and right banks. If the list of available objects required is missing, there should be a simple interface to create a new object with specified characteristics (semantics) - this is the implementation of the method of automated creation of GIS layers to display arbitrary simulation results according to our previous articles. At the software level, it will be enough to implement the Panorama interface to create new semantics, and objects - just add a little clarity to the fields that need to be filled. The main thing, after that, is to make sure that the user not only sets the
302
O. Mashkov et al.
attributive content of the semantics of the objects in this layer but also indicates where exactly they should be located (set the metrics). Visualization of modeling results is an automated generation of spatial, and attributive GIS data based on modeling results, providing the possibility of applying the capabilities of GIS technologies to them. The essence of the method of automation of visualization of results of modeling of natural processes in geoinformation systems consists in the use of mathematical packages of applied programs for the formalization of models, and automation of transfer of information from them to GIS. Spatial data are used to place special symbols on the GIS map, and attributive data - to visualize the values that are the result of the calculation of the mathematical model [4,10]. Thus, the implementation of the method is carried out by the interaction of the following components: – Excel-application designed to formalize the model and perform calculations, as well as to form an exchange file; – exchange file containing the source data obtained as a result of simulation (spatial and attributive); – geoinformation map, which will be used to visualize the simulation results; – a database from which, if necessary, the parameters of the simulation objects are selected for use in the computational process; – shell program that reads data from the exchange file and automatically generates spatial and attributive GIS data based on the simulation results. 3.3
Interaction of Modeling Geoinformation System Components
We will systematize and formalize the main components, quantities, and variables of mathematical models of processes, geoinformation models (GIS-models) of systems, and relational models of databases. – Mathematical model. The following classification of quantities and variables of a typical mathematical model from the point of view of their purpose is offered [8,9]: • input variables U - variables set for calculations; • output variables Y - variables that are the result of calculation; • state variables X - variables that are the result of the calculation at the intermediate stage of calculation; • numerical parameters K - parameters that are set or calculated during the identification of model parameters. Of course, if we consider all the variety of mathematical models and the common names of their quantities and variables, then the classification will be much more complicated. For example, state variables can be source variables, constants can change their values depending on the values of other constants or variables (for example, the value of temperature or atmospheric pressure), parameters can be variable (depending on time t) or distributed in space (depend on spatial coordinates). Sometimes there is a “perturbation”, which has a different nature
Method of Mathematical and Geoinformation Models Integration
303
than the input variables, and others. However, in the first approximation, such complications will not be taken into account yet. We show how the above types of quantities and variables of a typical mathematical model interact with GIS and DBMS data [6,7]: Input variables U - their values are read from the database (denote the set of database tables that can store data as D) and GIS (denote the set of tables and GIS maps as G): U ←D∪G (1) Output variables Y - their values are displayed on a GIS map or on the screen (denote the set of information on the computer screen as E) in the form of a table, graph, etc.: Y →G∪E (2) State variables X - their values are temporarily stored in DBMS databases without output on the GIS map or on the screen: X→D
(3)
Numerical parameters K - can change when working with the model, and are stored in special tables in the database: K←D
(4)
In general, the model of interaction of GIS data, DBMS and mathematical model can be written in the form: U ← D ∪ G; K ← D; X → D; Y → G ∪ E
(5)
We propose to distinguish between two such options for applying the mathematical model: – modeling - when all the parameters and structure of the model (K) are known and you only need to perform calculations on it (find X and Y ) - the model of interaction in the form (5); – identification - when it is known what result of Y should give the model for certain inputs U , and you need to find the values of its parameters and structure (X and K) - the model of interaction in the form: U, Y ← D ∪ G; X ← D; K → E. 3.4
(6)
Relational Database Model
A database model is a way of displaying the relationships between its data at a logical level. There are the following types of models: relational, hierarchical, network, and conceptual. The most widespread model is the relational database. Relational databases allow storing information in two-dimensional tables, interconnected by data fields, so-called “keys”. Key (index) - a unique name of the record, which can be an element of any attribute in the record - a simple key,
304
O. Mashkov et al.
and a set of elements of several attributes - a composite key. The key is used to identify each specific record, as well as to organize the records in the file. Links allow you to group records in a set, as well as specify the relationships between these sets. Normalization rules are used to form a relational data model. Normalization is the division of a database table into two or more, which are characterized by better properties when adding, changing, and deleting data. The normalized database ultimately does not contain redundant data. The ultimate goal of normalization is not so much to save memory as to eliminate data inconsistencies. An analogy is made between the data types of mathematical models and database models (Table 1). Table 1. Analogues of data types in mathematical models and databases are offered In mathematical models
In database models
The value of the parameter X
Table with 1 field and 1 record
A vector string of values of m parameters, such as a vector K model parameters
Table with m fields and 1 record
Vector column of values of one parameter, for example discrete values of the indicator X[m], measured with the same interval in moments of time m = 1, 2, . . .
Table with 1 field and m records
Matrix [m x n]; the value of the indicator X(t, z) at points of the river with coordinates z at different points in time t
Table with n fields and m records
Table 1 covers all possible types of relationships between mathematical data models and variants of database table structures. Thus, other types of mathematical concepts must be reduced to them. When designing the information component of the GIS model of objects, you should first determine the type of their display - are they designed in the form of point, linear or planar? To answer this question, you need to know exactly what the mathematical models describe, for example, it may be the concentration of C(t, x, y) chlorides in a well, river, or reservoir, which varies in time t and space along, for example, two coordinates x and y, or another option. We draw an analogy between the ways of presenting data in mathematical and geoinformation models (Table 2). Compliance with the requirements for maintaining the correct topology of objects ensures the proper functioning of GIS. GIS with erroneous topology (river tributaries have no points in common with the main river, water quality monitoring posts are not located on water bodies, etc.) will not work properly, and some software tools for automated geographic information processing will not work at all.
Method of Mathematical and Geoinformation Models Integration
305
Table 2. Analogues of data representation methods in mathematical and geoinformation models In mathematical models In GIS models The characteristics of the object do not change, or change only in Point object: time t, for example, the concentration of chlorides C(t) at the post of water quality monitoring The characteristics of the object change in time and in one spatial coordinate, for example, the concentration of chlorides C(t, x) in the river fairway along its course
Linear object:
The characteristics of the object change in time, and in two spatial coordinates, such as the distribution of the concentration of chlorides C(t, x, y) in the fairway of the river along its course, and depth, or along the course, and width
Plane object:
The topology of objects should be determined already during the formalization of the input data of the mathematical model. It is then that key physical objects and their topological relationships should be identified.
4 4.1
Results and Discussion Generating Queries to the Database
It is known that each mathematical model, such as a differential equation, must have its own equation and initial (or boundary) conditions. To apply it in practice, find the solution of the equations of the model, i.e. get a mathematical expression for its calculation, as well as set certain numerical values of the model parameters, steps, and intervals of calculation. To ensure the possibility of using GIS technologies, it is necessary to specify the following: – the spatial object for which the simulation is performed; – restrictions on the spatial object that will fit this model. A specially created shell program is used to automate the processing of this information. First of all, it indicates the GIS map (.map or .sit file) that is being worked on. The program automatically determines the classifier of the map, and imports its parameters and contents to a specialized MS Excel file, which is an add-on that provides formalization, input and collection of all data. The specialized Excel application has the following sections: – list of all objects imported from the classifier of the map on which the automated visualization of simulation results will be carried out; – the constraints section for the feature; – section of model formalization and direct calculation according to it (Fig. 1).
306
O. Mashkov et al.
Fig. 1. Formalization of the model and direct calculation for it in Excel-application
The section of restrictions in a formalized form describes all the requirements for the object, the implementation of which ensures the correctness of the results of the model. The following characteristics should be used [6,7]: – spatial relationships between objects using typical operations with sets: (∩, ∪, ⊃, ⊇, , ⊂, ⊆, , ∈, ∈, / etc.), meaning the relationship between the sets of coordinates of the corresponding objects, for example that only such monitoring objects are considered, the coordinates of which coincide with the coordinates of the rivers; – spatial functions (their set is fixed, but it is allowed to enter expressions based on them, such as restrictions on the meandering of the river in the form of Lf /Lp ≤ 2): • for point: height H (above sea level); • for linear: average height H (above sea level), height of the beginning of the line H1 (above sea level), height of the end of the line H2 (above sea level), length Lf along the line, length along the line Lp between the beginning and end of the line; • for planar: average height H (above sea level), average height of perimeter points Hp (above sea level), length of perimeter L; area S; – restrictions on the value of attribute parameters from an external database, such as a database MS Access. If the simulation object is not on the geographic map, it can be created using a simple interface to create a new object with the specified characteristics (semantics) and add it to the map classifier. Then it can be applied to the GIS map using tools for editing the shell program.
Method of Mathematical and Geoinformation Models Integration
307
When applying the theoretical provisions of identification methods, different variants of completeness of input data for modeling should be taken into account. Table 3 lists all four current options. We propose to solve the problem of automating data exchange between models of different types in several stages: 1. To draw an analogy between the concepts and types of data in mathematics and GIS technologies and the theory of relational databases. 2. In the presence of a database and/or GIS - formalize the processes of reading information from the database and GIS to perform calculations on a mathematical model and enter the results of calculations in these systems for storage and visualization. 3. In the absence of a database and/or GIS - develop a unified algorithm for sequential transformation of a mathematical model to a certain form, which allows you to immediately design the appropriate models of relational database and geographic information system. 4. Create software that will automate the process of applying this method. The main idea on which the algorithm of the proposed identification method is based is that it forms a set of transitional concepts, models, and methods that are common in classical mathematics, and for which one can find analogs in the theory of databases and GIS technologies, and typical concepts, models and methods of mathematics, database theory and GIS technologies are reduced to these transitions. Accordingly, the application of the method is limited only to those mathematical models that can be correctly reduced to transitional models. Table 3. Options for setting problems of automated synthesis of models of different types № Identified structure and entered data or identified parameters GIS maps Databases Mathematical model (MM)
Formulation of the problem
1
+
+
–
Identification of parameters and structure of the mathematical model according to GIS and DB
2
–
+
+
Identification of the structure (classifier) of the GIS map, which corresponds to the structure of the database and MM, and the creation of this map
3
+
–
+
Identification of the database structure that corresponds to the MM data and GIS objects, and its filling with data
4
–
–
+
Identification of the structure and creation of a database and GIS map that correspond to MM
308
O. Mashkov et al.
Let’s develop a theoretical apparatus for the formation of transitional models and their compliance with typical mathematics, database theory, and GIS technologies. The main operators that organize the exchange of information with databases are as follows: – selection of field values (Par1, Par2,. . . Parr) from table T that meet the specified selection criteria Ω: SELECT Par1, Par2, . . . Parr FROM T WHERE Ω
(7)
– entering the values X1, X2,. . . Xr of the given fields (Par1, Par2,. . . Parr) in the table T: INSERT INTO T(Par1, Par2, . . . Parr) VALUES X1, X2, . . . Xr
(8)
It is important to note that the result of the SELECT operation is a new table T1 with fields Par1, Par2,. . . Parr, whose names are listed between the words “SELECT” and “FROM” in (7). We propose to formalize these operations by analogy with mathematical functions: – for (7): T1 = Select(T, Par1, Par2, . . . , Parr, Ω)
(9)
T1 = Insert(Par1, Par2, . . . , Parr, X1, X2, . . . , Xr)
(10)
– for (8): Models (9), (10) can be written as follows: MT1 = SELECT(T, P, Ω)
(11)
VT = INSERT(P, X)
(12)
P = [Par1, Par2, . . . Parr], X = [X1, X2, . . . Xr],
(13)
where MT1 is the dimension matrix [n x r] (n is the number of records that satisfy the set of selection criteria Ω); VT - vector string of values of dimension r; P and X - vector-string of names, and values, respectively, parameters (fields of table T) of dimension r. Thus, to read the value of the length L of the river №1 from the table T of the passport data of the rivers it is necessary to use the following function: L = Select(T, L, I = 1), which in SQL language looks like this: SELECT L F ROM T W HERE I = 1. The corresponding table T should look for example as shown in Fig. 2: To save the value of the length L of the river №1 in the table T of the passport data of the rivers it is necessary to use the following function:
Method of Mathematical and Geoinformation Models Integration
309
Fig. 2. View of a database table that contains river lengths in km
I 1 T = Insert(P, X), P = , X= , L 115
(14)
that in SQL language looks so: INSERT INTO T(I,L) VALUES 1,115. Thus, for the synthesis of the system “GIS + DB” according to a given mathematical model, the following should be performed: 1. Transformation of the mathematical model and its value to the forms presented in Table 3. 2. Identification of key physical objects with which the model works and definition and formalization of relations between them. 3. Design of the database, according to the formed models. 4. Determining the dimensionality of spatial distribution and changes in time of the parameters of key physical objects and the corresponding adjustment of the GIS classifier in accordance with Table 3. 5. Formalization of the process of data exchange between the mathematical model and the database through the operators Select and Insert. 6. Setting up the process of visualization of calculation results, i.e. their display and printing by means of GIS technologies. 7. Testing the created system on examples. In accordance with the above task, the process of creating a GIS or DBMS should be automated and programmed so that any user only enters the necessary data from the keyboard and answers specific questions. 4.2
Stage of Automated Database Creation
The stage of automated database creation is the following steps: – relational database model is synthesized (what tables are needed, their fields, field types, fields that are related to others are defined, etc.) [8]; – the VBA language creates a database according to its model; – similarly, forms are created for convenient input by the user of all necessary data. The stage of automated creation of GIS is to implement such steps: – the geoinformation model of the system is synthesized in the form of UMLmodel (which should be layers, objects, their spatial types, topology, etc.); – by means of object-oriented programming (Delphi, Visual C++, etc.) the map classifier is formed according to the selected GIS model;
310
O. Mashkov et al.
– if there is no data on the spatial location of objects (coordinate sets from GPS receivers, geodetic survey data, etc.), then the user is asked to create a map; to do this, run the necessary demonstration videos that show what to click to draw the necessary objects; if all the coordinates of the objects are there, then they are automatically created by vectorization. To work with the mathematical model, specialized software and an editor for its introduction have been developed. The created technology was tested on automation of visualization of modeling results from the mathematical computing package MS Excel in geographic information systems in the format “Panorama 9” [6–8]. A specialized Excel application for formalization and direct implementation of the model has been developed, which also generates an exchange file for transferring the simulation results to the geographic information program shell. A shell program has also been developed, the map provides visualization of simulation results on an arbitrary geographic information map for the selected object based on the exchange file and has a convenient and clear user interface (Fig. 3). For approbation of the developed software, the exponential model of distribution of pollution in space for linear objects (rivers) and dynamics of change of concentration of pollutant in time according to the same model for point objects (observation objects) with the subsequent visualization of modeling results on the geoinformation map of Vinnytsia region is realized (Fig. 4).
Fig. 3. The interface of the geographic information program-shell
Method of Mathematical and Geoinformation Models Integration
311
Fig. 4. Results of automation of visualization of simulation results in GIS for linear and point objects
5
Conclusions
The scientific and methodological basis for the integration of mathematical and geoinformation models based on the unification of the formalization of environmental data has been developed. The ways of integration of the theoretical basis of mathematical modeling from information sources about the state of the environment and the processes of its pollution are determined. An algorithm for selecting a modeling object, an algorithm for selecting rules for recalculating coordinates, and positioning a calculation grid on an object are proposed. Systematization and formalization of the main components, quantities, and variables of mathematical models of processes, geoinformation models of systems, and relational models of databases are carried out. The model of relational databases and analogs of methods of data representation in mathematical and geoinformation models are offered. The stages of solving the problem of automating the exchange of data between models of different types in several stages are proposed. Variants of setting problems of automated synthesis of models of different types, depending on the identity of the structure and parameters of these models are considered. The stages of the automated creation of a database management system are determined. The approbation of the developed software is considered on a model example.
References 1. Akinina, N.V., Gusev, S.I., Kolesenkov, A.N., Taganov, A.I.: Construction of basic graphic elements library for geoinformation ecological monitoring system. In: 27th International Conference Radioelektronika (RADIOELEKTRONIKA), pp. 1–5 (2017). https://doi.org/10.1109/radioelek.2017.7937585 2. Belyakov, S., Belyakova, M., Glushkov, A.: Intellectual cartographic visualization procedure for geoinformation system. In: 3rd Russian-Pacific Conference on Computer Technology and Applications (RPC), pp. 1–4 (2018). https://doi.org/10. 1109/rpc.2018.8482160
312
O. Mashkov et al.
3. Chernyavsky, G.M.: Space monitoring of the environment and global safety. Math. Comput. Simul. 67(4–5), 291–299 (2004). https://doi.org/10.1016/j.matcom.2004. 06.027 4. Gordienko, L., Ginis, L.: Geoinformation project as complex object forecasting and decision making tool in intelligent information and management systems. In: International Russian Automation Conference (RusAutoCon), pp. 653–657 (2020). https://doi.org/10.1109/rusautocon49822.2020.9208046 5. Kinakh, V., Bun, R., Danylo, O.: Geoinformation technology of analysis and vizualization of spatial data on greenhouse gas emissions using Google Earth Engine. In: 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), pp. 212–215 (2017). https://doi.org/10.1109/ stc-csit.2017.8098771 6. Mokin, V.B.: Development of the geoinformation system of the state ecological monitoring. In: Geographic Uncertainty in Environmental Security. NATO Science for Peace and Security Series C: Environmental Security, pp. 153–165 (2007). https://doi.org/10.1007/978-1-4020-6438-8 9 7. Mokin, V.B., Bocula, M.P., Kryzhanovs’kyj, Y.M.: Informacijna texnolohiya intehruvannya matematychnyx modelej u heoinformacijni systemy monitorynhu poverxnevyx vod. VNTU, Vinnycya (2011) 8. Mokin, V.B., Kryzhanovs’kyj, Y.M.: Heoinformacijni systemy v ekolohiyi. VNTU, Vinnycya (2014) 9. Molenaar, M.: Status and problems of geographical information systems. The necessity of a geoinformation theory. ISPRS J. Photogram. Remote Sens. 46(2), 85–103 (1991). https://doi.org/10.1016/0924-2716(91)90018-q 10. Plakhotnij, S.A., Klyuchko, O.M., Krotinova, M.V.: Information support for automatic industrial environment monitoring systems. Elec. Contr. Sys. 1(47), 29–34 (2016). https://doi.org/10.18372/1990-5548.47.10266 11. Pradhan, B.: A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput. Geosci. 51, 350–365 (2013). https://doi.org/10.1016/j.cageo. 2012.08.023 12. Shanmugasundaram, R., Santhiyakumari, N.: Urban sprawl classification analysis using image processing technique in geoinformation system. In: Conference on Emerging Devices and Smart Systems (ICEDSS), pp. 192–196 (2018). https://doi. org/10.1109/icedss.2018.8544367 13. Sultangazin, U.: Information systems based on space monitoring for solution of some problems of sustainable development. Math. Comput. Simul. 67(4–5), 279– 290 (2004). https://doi.org/10.1016/j.matcom.2004.06.028 14. Thomas, B.C., Goracke, B.D., Dalton, S.M.: Atmospheric constituents and surfacelevel UVB: implications for a paleoaltimetry proxy and attempts to reconstruct UV exposure during volcanic episodes. Earth Planet. Sci. Lett. 453, 141–151 (2016). https://doi.org/10.1016/j.epsl.2016.08.014 15. Veselov, V.V., Panichkin, V.Y., Zakharova, N.M., Vinnikova, T.N., Trushel, L.Y.: Geoinformation and mathematical model of Eastern Priaralye. Math. Comput. Simul. 67(4–5), 317–325 (2004). https://doi.org/10.1016/j.matcom.2004.06.016
Method of Mathematical and Geoinformation Models Integration
313
16. Zakharov, S., Taganov, A., Gusev, S., Kolesenkov, A.: The analysis and monitoring of ecological risks on the basis of fuzzy petri nets. In: 3rd Russian-Pacific Conference on Computer Technology and Applications (RPC), pp. 1–5 (2018). https:// doi.org/10.1109/rpc.2018.8482155 17. Zolnikov, I.D., Postnov, A.V., Lyamina, V.A., Slavinski, V.S., Chupina, D.A.: Geoinformation modeling of environments favorable for prehistoric humans of the altai mountains. Archaeol. Ethnol. Anthropol. Eurasia 41(3), 40–43 (2013). https://doi.org/10.1016/j.aeae.2014.03.006
Prediction of Native Protein Conformation by a Hybrid Algorithm of Clonal Selection and Differential Evolution Iryna Fefelova1 , Andrey Fefelov1 , Volodymyr Lytvynenko1 , Oksana Ohnieva1(B) , and Saule Smailova2 1
2
Kherson National Technical University, Kherson, Ukraine {fim2019,fao1976}@ukr.net, oksana [email protected] D. Serikbayev East Kazakhstan State Technical University, Ust-Kamenogorsk, Republic of Kazakhstan [email protected]
Abstract. The methods for protein structure prediction are based on the thermodynamic hypothesis, according to which the free energy of the “protein-solvent” system is minimal in the folded state of protein. By predicting the tertiary protein structure, it is theoretically possible to predict its action. This problem is considered as a global optimization issue. To solve it, a hybrid method based on a combination of clonal selection and differential evolution algorithms is proposed. The variants of protein structures being the subject of the algorithm are represented by a set of torsion angles of the elements of the main and side chains. Evaluation of solution options is carried out using the potential energy function, considering the molecular dynamics of the protein. The effectiveness of the proposed method is confirmed by experimental studies. Keywords: Protein tertiary structure prediction · Conformation energy · Differential evolution · Clonal selection · Bioinformatics
1
Introduction
The methods for predicting protein structure are based on the thermodynamic hypothesis [2], according to which the free energy of the “protein-solvent” system is minimal in the natural folded state of protein. This energy is determined by two components: the energy of intermolecular interaction within the amino acid sequence and the free energy of solvation. According to Anfinsen’s hypothesis, the three-dimensional structure of a protein is completely dependent on these components [1]. The protein function is determined by its three-dimensional conformation, therefore, by knowing or predicting the protein tertiary structure, it is theoretically possible to predict its action. Obtaining the protein structure experimentally is not always possible, and often impractical due to significant time and financial costs. An alternative to c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 314–330, 2022. https://doi.org/10.1007/978-3-030-82014-5_21
Prediction of Native Protein Conformation
315
this method is to predict the tertiary structure based on the amino acid sequence. This problem can be viewed as a global optimization issue. It is necessary to find the minimum of energy function corresponding to the native (or close to it) state of the protein. One of the approaches to developing the energy function is based on fundamental laws considering various physical forces of interaction between particles. Energy functions based on real processes occurring in proteins are theoretically capable of incorporating all possible effects that are essential for predicting their tertiary structure. However, such calculations have high computational complexity, and the surface formed by the free energy function contains many local optima, which significantly complicates the search. Therefore, in practice, they use simpler empirical analogs that take into account the molecular mechanics of a protein and its interaction with a solution. These are, as a rule, functions of the potential energy of a molecule or force fields, in which the total energy of the conformation is made up of individual components, such as bonding energy, Van der Waals interactions, electrostatic interactions, etc. [25]. Based on the values of the potential energy function calculated for the given conformations of the protein molecule, the search algorithm concludes on which conformation corresponds better and which one worse to the native state. To set the conformation of the amino acid sequence, a number of parameters are used that explicitly determine the position of each atom in three-dimensional space. The torsion angles of mutual rotation of the molecule parts relative to the general chemical bond are often used as parameters. It is obvious that the number of parameters depends on the length of the amino acid sequence and can reach high values. Thus, the search space for the optimal conformation is large, which makes the choice of heuristic algorithms relevant. In recent decades, a number of methods have been used to solve the problem of predicting the tertiary structure of the protein, such as Monte Carlo methods, molecular dynamics, evolutionary algorithms, etc. [3,7,9,14]. In this work, the authors apply an innovative approach to this problem based on a hybrid algorithm of clonal selection and differential evolution.
2
Problem Statement
Three levels of the structural organization of a protein molecule should be considered. The primary structure is a linear sequence of amino acid residues in a molecule. Secondary structure is a spatial packing of the elements of the main chain with the formation of local structures: α-helices and β-layers. Tertiary structure is a packing of elements of a secondary structure in space, upon which a protein exhibits biological properties. The structure of each level defines the structure of the next level after it. This important property allows computational methods to be applied to predict complex derived structures based on their simpler predecessors. The prediction task is formulated as follows: based on the primary structure of the protein, it is necessary to determine its tertiary structure, i.e. find the three-dimensional coordinates of all the atoms that compose the protein in
316
I. Fefelova et al.
its native state. In this work, the torsion angles of the elements of the main (φ, ψ, and ω) and side chains (χ) are used to represent the three-dimensional structure of the protein. The search space is reduced by introducing constraints on the range of values of torsion angles. For the main chain angles, the constraints are determined by the predicted secondary structure of the molecule. In this case, each residue is assigned an identifier according to the DSSP classification [18,24]. For side chain elements, angle constraints are selected from the rotamer library [11]. The conformation energy is calculated on the assumption that the protein molecule can be regarded as a classical mechanical system. Hence, according to the principle of additivity (one of the fundamental principles of molecular mechanics), the total potential energy E is the sum of the energies of covalent Ebnd and non-valent Enbnd bonds between protein atoms [25]: E(C) = Ebnd (C) + Enbnd (C)
(1)
where C is the representation of the molecule conformation as a set of torsion angle values. Covalent interactions are described by harmonic vibrations of covalent bonds, angles between three atoms, torsion interactions, and Urey-Bradley corrections. The non-valent part consists of the electrostatic potential and Van der Waals potential. Despite the introduced constraints and a simplified energy function, the problem has many local solutions and requires the development of a new, fast, and accurate search method that takes advantage of the population approach and hybridization technology.
3
Literature Review
The problem of predicting the protein tertiary structure is one of the most complicated problems in bioinformatics today. Nevertheless, the achieved results of using computational methods and their combinations make it possible to assume that this approach has high potential in overcoming the difficulties associated with the experimental determination of the protein structure. Among computational methods, there are comparative approaches based on already accumulated knowledge and ab initio approaches, determining the tertiary structure from scratch. In homology modeling methods [17,28], the original sequence is compared with the template of a known tertiary structure. Threading methods [20,22,23] superimpose the original sequence on the template structure and draw conclusions about the result quality using a special evaluation function. In both cases, a prerequisite for the use of these methods is the presence of a data bank of the studied sequences and their tertiary structures. The need for templates limits the possibilities of comparative methods since their accuracy depends on how similar the target and template sequences are. Moreover, these methods give lower accuracy on long proteins, since the total folding obtained from a large number of matching fragments of a long chain may be far from the actual native conformation due to the accumulation of error introduced by mismatched fragments.
Prediction of Native Protein Conformation
317
In contrast to comparative methods, ab initio approaches [16,19] imply the prediction of the three-dimensional protein structure solely on the basis of its primary sequence. In this case, the prediction is based on the physicochemical properties of the molecule, such as, for example, the hydrophobicity and hydrophilicity of amino acid residues. Since prediction “from scratch” is usually associated with enumeration and evaluation of a large number of solutions, ab initio approaches are effective only for short proteins and only if simplified forms of the energy function are used. To increase the efficiency of enumeration, heuristic methods [15], in particular, evolutionary algorithms, are used. Despite the high computational complexity, ab initio approaches have an important advantage - they retain accuracy and flexibility when studying new proteins that are designed to improve the properties of existing structures or to obtain structures with new properties. In [6], which summarizes the results of the first decade of the CASP experiment on predicting protein structures, it is argued, among other matters, that the cooperation of the above methods will become the basis for solving the problem of protein folding. In this regard, it should be noted that there are a number of publications where comparative approaches are combined with those based on the calculation of energy. For example, in [3,9], based on templates extracted from the database, the secondary protein structure is determined, which is used to reduce the search space of an evolutionary algorithm predicting the tertiary structure using the free energy function.
4 4.1
Materials and Methods Input Data and Solutions Presentation
The chosen way of presenting solutions can significantly affect the performance of the optimization algorithm. In this work, the solutions are conformation variants of the amino acid sequence. Conformation, as a rule, is set in one of two ways: by coordinates of atoms in a rectangular coordinate system or a set of dihedral (torsion) angles between bound atoms. In the context of the problem being solved and the applied computational methods, the description of the structure using torsion angles has advantages over the first method [4]. During the generation of solutions, atoms move along arcuate trajectories (Fig. 1), which significantly reduces the number of unacceptable solutions and allows introducing fewer constraints and sanctions when evaluating individuals. Figure 1 shows that the main chain consists of three types of torsion angles: φ (N − Cα bond), ψ (Cα − C bond) and ω (C-N bond). Rotation around the C-N peptide bond is complicated as it is an almost half double bond. The atoms of the peptide group (N -H, C = O) and the Cα atoms associated with them lie in the same plane. Therefore, the angle ω can be considered constant and equal to 180◦ . Torsion angles χi belong to the side chains R. Their number (nχ , nχ ∈ [0, 4]) depends on the type of amino acid residue. For example, arginine has four torsion angles in the side chain (nχ = 4), while glycine has no side
318
I. Fefelova et al.
Fig. 1. A section of the amino acid sequence with an indication of the location of the dihedral angles determining its three-dimensional structure
chain at all (nχ = 0). Thus, the conformation of each element of the sequence, depending on its type, is determined by at least two and at most six parameters. Table 1. Classes of elements of the secondary structure and their corresponding intervals of acceptable values for the angles φ and ψ Structure classes φ(◦ )
ψ (◦ )
H: α-helix
[–67; –47]
[–57; –37]
G: 310 -helix
[–59; –39]
[–36; 16]
I: π-helix
[–67; –47]
[–80; –60]
E: β-strand
[–130; –110] [110; 130]
B: β-bridge
[–130; –110] [110; 130]
T: turn
[–180; 180]
[–180; 180]
S: bend
[–180; 180]
[–180; 180]
C: the rest
[–180; 180]
[–180; 180]
Since during the formation of the tertiary structure, the protein retains the structural motifs of its secondary structure, it is possible to impose constraints on the range of acceptable values of the main chain angles. This will reduce the search space and increase the performance of the optimization algorithm. To obtain the upper and lower bounds of the value interval of the angles φ and ψ, the classification algorithm is used, which is part of the SSpro package [24]. As a result of its operation, each element of the amino acid sequence is assigned one of eight classes of structural motifs (Table 1). The intervals are formed based on the boundaries of the clusters corresponding to these classes on the Ramachandran maps [9]. Constraints were also introduced for the torsion angles of the side chains (Table 2). The constraints are calculated based on the statistical information presented in the rotamer library in the form described in [8].
Prediction of Native Protein Conformation
319
Table 2. Intervals of admissible values of torsion angles χi for different types of amino acid residues Amino acid χ1 (◦ )
χ2 (◦ )
χ3 (◦ )
χ4 (◦ )
ARG
[–177; 62] [–167; 180] [–65; 180]
[–175; 180]
LYS
[–177; 62] [–68; 180]
[–68; 180]
[–65; 180]
MET
[–177; 62] [–65; 180]
[–75; 180]
–
GLU
[–177; 70] [–80; 180]
[–60; 60]
–
GLN
[–177; 70] [–75; 180]
[–100; 100] –
ASP
[–177; 62] [–60; 65]
–
–
ASN
[–177; 62] [–80; 120]
–
–
ILE
[–177; 62] [–60; 170]
–
–
LEU
[–177; 62] [65; 175]
–
–
HIS
[–177; 62] [–165; 165] –
–
TRP
[–177; 62] [–105; 95]
–
–
TYR
[–177; 62] [–85; 90]
–
–
PHE
[–177; 62] [–85; 90]
–
–
PRO
[–30; 30]
–
–
–
THR
[–177; 62] –
–
–
VAL
[–60; 175] –
–
–
SER
[–177; 62] –
–
–
CYS
[–177; 62] –
–
–
Based on the above information, it follows that the data input to the tertiary structure prediction algorithm are two character strings (Fig. 2). The first string contains a sequence of amino acid residues in the form of a single-letter code. The second string presents the predicted classes of the secondary structure that correspond to the residues in the first string.
Fig. 2. The representation of the input data of the problem (on the example of 1ZDD protein)
Based on the input data, a decision string is formed (an individual). Figure 3 shows a fragment of an individual’s string for the first three elements of the 1ZDD sequence. The length of an individual depends not only on the number of elements in the sequence, but also on their type (the number of side chain parameters).
320
I. Fefelova et al.
Fig. 3. Internal parameterized representation of the amino acid sequence
The elements of an individual’s string are represented by real numbers, the values varying in the range from 0 to 1. To obtain real values of the angles, these numbers are mapped into the ranges of acceptable values in accordance with the constraints introduced above. 4.2
Differential Evolution
Differential evolution (DE) is a multidimensional optimization method based on an evolutionary approach [26,27]. Essentially, DE resembles a genetic algorithm (GA) based on Charles Darwin’s principles of evolution. Unlike the classical GA, in the DE algorithm, individuals are encoded with real numbers, which is due to the peculiarities of the mutation operator functioning. DE operates with a population of vectors (individuals) xi , i = 1, . . . , N . N is the population number. Each vector xi represents a variant of the optimization problem solution. The solution is given by m arguments of the objective function f (x): (2) f (x) → min, x = (x1 , x2 , . . . , xm ) . If there is no prior knowledge of the problem, the initial population of individuals (I = 0) is randomly generated. In the current generation I, each vector xIi undergoes mutation obtaining a trial vector: (3) viI+1 = xIk3 + F xIk1 − xIk2 , where viI+1 is the trial vector obtained as a result of applying the mutation operator; xIk1 , xIk2 , xIk3 are individuals of the current population, chosen at random so that k1 = k2 = k3 = i and ki = 1, . . . , N ; F is a scale factor. The obtained vector viI+1 jointly with the vector xIi form a candidate vector I+1 ui , the elements being selected according to the following rule: I+1 vij , at j = 1 + [p]m , . . . , 1 + [p + l]m I+1 , (4) uij = xIij , otherwise where p and l are random non-negative integers, l ∈ [ 0; m ); [·]m – division operator modulo m. The operation of forming a candidate vector is similar to crossing-over in genetic algorithms. The rule (4) guarantees that as a result of crossing-over, at least one component of the trial vector is transferred to the candidate vector. I+1 ≤ The candidate vector is transferred to the new generation only if f ui I I f xi . Otherwise, so does vector xi . There are several schemes of differential evolution, which differ in the implementation of the mutation operation. However, they do not change the essence of this method.
Prediction of Native Protein Conformation
4.3
321
Clonal Selection
According to the theory of clonal selection, cells circulating in the host’s body contain receptors on their surface that can directly recognize antigens (foreign substances and microorganisms that pose a danger to the host’s body). These cells are B and T lymphocytes. On impact with an antigen, these cells begin to multiply by cloning. The resulting clones undergo somatic hypermutation, forming a variety of receptors. Antigens with receptors form bonds characterized by the strength of interaction called affinity. The higher the binding affinity of an antigen to a B cell, the more likely it is to be cloned. Repeated actions described above start the evolutionary process of affinity maturation. In combination with immune memory, this process provides long-term and sustainable protection of the body against a specific antigen, demonstrating a high degree of responsiveness to re-infection. This principle underlies one of the types of computational methods - the clonal selection algorithm (CSA) [10]. Generally, the algorithm solves the optimization problem (2). The initial population is randomly generated. In each iteration (generation), individuals are assessed to obtain affinity values: g(x) = Fg (f (x), θ) ,
(5)
where g(x) is the affinity function, which is developed on the basis of the objective function of the problem and a set of parameters θ, describing some additional features of the solution affecting its quality. copies, the Each vector xIi undergoes cloning with the formation of multiple number being directly proportional to the affinity, CiI ∼ g xIi . It is likely that some of the individuals from the current population with the lowest affinity will not be cloned at all. Clones mutate by adding a random vector v to each individual: (6) uIs = xIs + v, s = 1, . . . , C, where uIs is an individual resulting from a mutation; C is the number of the clone population. Normally, the elements of the vector v are formed by generating a random number with a normal distribution and zero mean: vj = ξ (0, σ) , j = 1, . . . , m,
(7)
where ξ (0, σ) is a random real number obtained by a generation with a normal distribution, zero mean, and standard deviation σ. The expression (7) is valid if individuals are encoded with real numbers, which is the case in the context of this work. The mutant vector uIs is transferred to a new generation according to the following rule: = uIs , if ∃Cn ⊂ {1, . . . , C} : C − |Cn | ≤ N ∧ ∀j∈Cn g uIs ≥ g uIj . (8) uI+1 s In other words, from the population of clones, a number of the best individuals are selected in an amount less than or equal to N . These individuals pass into
322
I. Fefelova et al.
the next generation. The rest of the population in the new generation is supplemented with randomly generated vectors. With this approach, the number of the main population remains unchanged throughout the entire cycle of the algorithm. 4.4
Hybrid Method
Hybridization involves the combination of several computational methods aiming at successful solutions of complex problems that are difficult or impossible to solve by each method separately. Hybrid methods can compensate for each other’s weaknesses at the expense of their respective strengths, providing a combination of higher productivity and efficiency. In [12], studies were carried out that showed that hybridization of the clonal selection algorithm and differential evolution allows improving the quality of solving a complex optimization problem. In [13], the positive effect of hybridization was confirmed when solving the problem of protein folding using lattice models. In both cases, the maximum productivity was achieved by replacing the amino acid residue mutation operator (6) with a similar operator of the DE algorithm (6). In this case, mutation (6) is not completely excluded from the algorithm, but becomes an auxiliary operation. Its intensity is significantly reduced, and its role approaches the role of mutation in a conventional genetic algorithm. The pseudocode of the developed hybrid algorithm for clonal selection and differential evolution is shown in Fig. 4. In the zero generation (I = 0), an initial population of solutions AbI with a number of N is created. When creating individuals, the transformation of the input symbolic protein sequence (Fig. 2) into an array of torsion angle values (Fig. 3) is applied. The values of the angles are
Fig. 4. Hybrid algorithm for protein tertiary structure search
Prediction of Native Protein Conformation
323
represented by real numbers, reflecting the position of the angle value within the range of permissible values (Tables 1, 2). Thus, each element of the array can be specified by any number from 0.0 to 1.0. This unified representation simplifies the software implementation of the algorithm and saves memory resources. The angles in the population of zero generation solutions are randomly generated. To estimate the population and obtain affinities, the function of total potential energy (1) in the CHARMM implementation (version 22) was used [5]. In the CHARMM22 model, the energy of covalent bonds is determined as follows: Ebnd = Ebonds + Eangels + EU B + Edihedrals + Eimpropers ,
(9)
where Ebonds is bond stretching energy; Eangels – vibration energy of angles between two bonds having a common atom; EU B – Urey-Bradley potential; Edihedrals and Eimpropers – vibration energies of dihedral angles. The energy of non-valent bonds is presented as follows: Enbnd = EV dW + EES ,
(10)
where EV dW is Van der Waals bonds responsible for the atomic packing; EES – electrostatic bonds described by Coulomb’s law. To calculate the potential energy, two utilities from the TINKER application are used [25]. Using the PROTEIN utility, the representation of the protein structure by a set of torsion angles is converted into a representation in a rectangular coordinate system. The ANALYZE utility uses this information to calculate the energy for the specified model (in this case, CHARMM22). The affinity value of the individual corresponds to the value of the total potential energy: (11) giI = Ebnd AbIi + Enbnd AbIi , i = 1, . . . , N. Since the objective is to find the minimum of the energy function, the conformation with the lower affinity value will be considered the best in comparison, and vice versa. In each generation, the individuals of the main population are copied to form a population of clones AbC . Moreover, the number of copies of each individual depends on its affinity and is determined by the following expression: I C · gmax − giI I Ci = N (12) , I I j=1 gmax − gj I where gmax is the maximum affinity value present in the main population in the current generation I. The clone population is consistently subjected to two types of mutations. Simple mutation (6) is rarely used. Its probability is usually set within the range of pM = 0.1 ± 0.05. It is an auxiliary operator for maintaining population diversity. DE mutation (3) applies to all individuals without exception. As shown by previous studies, the greatest effect is achieved with the value F ≈ 0.8 of the scale factor.
324
I. Fefelova et al.
The altered (as a result of mutations) clone population AbDE is evaluated in a manner similar to that discussed earlier. N number of the best individuals are selected from it, and transferred to the main population AbI+1 . The operation of selection and the expression (12) provide constant numbers of the main population and the population of clones throughout the entire period of the algorithm operation. After the stop condition is met, one individual with the affinity giI = min(g i ) is selected from the current population. The tertiary structure appropriate for a given individual is considered the optimal solution.
5
Experiments
For experimental studies, four protein structures were selected, the description presented in the Protein Data Bank. These are such structures as 1PLW, 1ZDD, 1ROP, and 1CRN. Using the proposed algorithm, the tertiary structure of the selected proteins was predicted and the minimum conformation energy was calculated. In order to calculate the similarity of the predicted structure with the native version extracted from the protein bank, two metrics were used: rootmean-square deviation (RMSD) of atomic positions and distance matrix error (DME): n 1
r a − r b 2 , (13) RM CD(a, b) = i i n i=1 where ria , rib are positions of the corresponding atoms in the structures a and b; n – the number of atoms in the structure. To calculate the root-mean-square deviation, protein molecules should be aligned with each other. The alignment of structures was carried out using algorithms implemented in the ProFit software [21]. Unlike RMSD, the calculation of the DME does not require structure alignment. The expression for calculating the DME is as follows: n n
2 1
a ri − rja − rib − rjb . (14) DM E(a, b) = n i=1 j=1 The values of the parameters of the prediction algorithm for all proteins involved in the experiments remained unchanged (Table 1). Only the maximum number of generations changed, which determined the condition for stopping the algorithm. The proposed prediction algorithm is implemented in the C++ programming language using directives for programming multithreaded OpenMP applications. The procedures for calculating the values of the total potential energy of the conformation were subject to parallelization, since based on these values, an estimate of the individuals in the population is constructed. Due to this approach, it became possible to achieve a significant increase in the performance of the algorithm on processors with multiple cores (Table 3).
Prediction of Native Protein Conformation
325
Table 3. Algorithm parameters for all experiments Parameter name
Parameter value
Main population number (N )
100
Clone population number (C)
400
The probability of a simple mutation (pm ) 0.1 Scale factor (F )
6
0.8
Results and Discussion
Met-enkephalin (1PLW) is a short polypeptide of only five amino acid residues (Y-G-G-F-M). A peculiarity of this protein is the presence of a large number of local conformation optima. To predict the structure of Met-enkephalin, the algorithm produced 120 generations, calculating the conformation energy 47300 times. The prediction results are presented in Table 4. Errors (13) and (14) are calculated for two sets of atoms: the complete structure of the protein (all atoms) and the carbon atoms of the main chain (Cα atoms). The latter make it possible to estimate the proximity of the core of the resulting structure to the native state. The core actually defines the packing of the molecule in three-dimensional space. Figure 5a shows a graph of the convergence of the prediction algorithm. The graph shows that the energy practically reaches the found minimum value long before the stop condition is met. It should be noted that this value was found in the 50th generation after performing 19300 calculations of the energy function. Table 4. Results of the experiments with 1PLW protein Atom subset RMSD DME Minimum energy All atoms Cα
3.743 1.649
2.863 –27.1 1.277
1ZDD Protein consists of two helices, which, in the folded protein state, are located almost parallel to each other (Fig. 6b). The sequence contains 34 amino acid residues. To predict the tertiary structure (with a minimum potential energy of –1268.23 kcal mol-1) the algorithm took 220 generations. In this case, the energy function was calculated 87700 times. Table 5. Results of the experiments with 1ZDD protein Atom subset RMSD DME Mimimum energy All atoms Cα
7.672 5.140
4.974 –1268.23 3.497
326
I. Fefelova et al.
Fig. 5. Algorithm convergence graphs for predicting the tertiary structure of 1PLW (a) and 1ZDD (b) sequences
The convergence graph (Fig. 5b) shows that the algorithm quite successfully solves the set task of minimizing the potential energy function. The curve descends smoothly, forming a hyperbola characteristic of this kind of process. The prediction results are presented in Table 5 and Fig. 6a. It should be noted that deviations of the results obtained from the native form create the basis for further research, primarily related to the improvement of the affinity function. 1ROP Protein is a four-helical bundle consisting of two identical monomers. Each monomer contains 56 amino acid residues (Fig. 6d). In this experiment, the operation of the algorithm was limited to 430 generations; as a result, the potential energy function was calculated 173300 times. The results of predicting the tertiary structure for this protein are presented in Table 6 and Fig. 6c. The algorithm convergence graph in this experiment is shown in Fig. 7a. 1CRN Protein consists of two α-helices and two β-layers (Fig. 8b). In this case, the algorithm worked through 400 generations and calculated the energy function 159700 times. Table 7 shows the comparative results obtained in two experiments. It is indicative that, despite the fact that in the first experiment the potential energy appeared to be higher than in the second (560.6 kcal mol-1 versus 448.28 kcal mol-1), the results of the first experiment (in terms of RMSD and DME values) are generally better than the results of the second. The reason may be a conflict of two types of energy, namely Ebnd and Enbnd . Decreasing one leads to an increase in the other and vice versa. Herein, in further studies, it is advisable to revise the method of forming the total energy of the system, being the basis of the objective problem function. Figure 7b shows the algorithm convergence graph in this experiment. Figure 8a shows the structure obtained as a result of prediction.
Prediction of Native Protein Conformation
327
Fig. 6. Predicted (a, c) and native (b, d) conformations of 1ZDD and 1ROP proteins, respectively
Table 6. Results of the experiments with 1ROP protein Atom subset RMSD DME Mimimum energy All atoms Cα atoms
7.825 6.555
4.286 –539.9 3.551
Fig. 7. Algorithm convergence graphs for predicting the tertiary structure of 1ROP (a) and 1CRN (b) proteins
Table 7. Results of the experiments with 1CRN protein Atom subset RMSD DME
Mimimum energy
Experiment 1 All atoms Cα atoms
10.051 8.642 9.045 8.17
560.6
Experiment 2 All atoms Cα atoms
11.514 11.242 448.28 11.116 11.27
328
I. Fefelova et al.
Fig. 8. Predicted (a) and native (b) conformation of 1CRN protein
7
Conclusion
The paper proposes a hybrid method for predicting the tertiary protein structure based on a combination of clonal selection and differential evolution algorithms. To link the method to the task, a coding method and a function for evaluating individuals in the population are determined. Individuals are represented by real numbers that are associated with the degree measure of the torsion angles of the elements of the protein main and side chains. To reduce the search space, constraints are introduced on the values of torsion angles, for which the secondary structure of the input amino acid sequence is preliminarily determined. The evaluation of individuals is based on the calculation of the total potential energy function of the CHARMM22 structure. The proposed method is implemented in the form of an algorithm being the subject to experimental studies. The purpose of the experiments is to test the operation of the algorithm using the example of predicting the tertiary structure of four real proteins taken from an online protein databank. The experiments have shown that, in dynamics, the algorithm successfully solves the task of searching for a structure with minimum energy. However, there exist problems that, according to the authors, are associated with the chosen method for calculating the affinity of individuals, negatively affecting the prediction results and are confirmed by the values of the RMSD and DME characteristics. In this regard, it is planned to conduct further research and improve the mechanism for evaluating individuals.
References 1. Anfinsen, C.: Principles that govern the folding of protein chains. Science 181, 223–230 (1973) 2. Anfinsen, C., Haber, E., Sela, M., White, J.F.H.: The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proc. Natl. Acad. Sci. USA 47, 1309–1314 (1961) 3. Anile, A.M., Cutello, V., Narzisi, G.: Determination of protein structure and dynamics combining immune algorithms and pattern search methods. Nat. Comput. 6, 55–72 (2007) 4. Bray, J.K., Weiss, D.R., Levitt, M.: Optimized torsion-angle normal modes reproduce conformational changes more accurately than cartesian modes. Biophys. J. 101(12), 2966–2969 (2011). https://doi.org/10.1016/j.bpj.2011.10.054
Prediction of Native Protein Conformation
329
5. Brooks, B.R., et al.: CHARMM: the biomolecular simulation program. J. Comput. Chem. 30(10), 1545–1614 (2009). https://doi.org/10.1002/jcc.21287 6. Cozzetto, D., Di Matteo, A., Tramontano, A.: Ten years of predictions ... and counting. FEBS 272(4), 881–882 (2005). https://doi.org/10.1111/j.1742-4658. 2005.04549.x 7. Cui, Y., Chen, R.S., Wong, W.H.: Protein folding simulation using genetic algorithm and supersecondary structure constraints. Proteins: Struct. Funct. Genet. 31(3), 247–257 (1998) 8. Cutello, V., Narzisi, G., Nicosia, G.: A multi-objective evolutionary approach to the protein structure prediction problem. J. R. Soc. Interface 3, 139–151 (2006). https://doi.org/10.1098/rsif.2005.0083 9. Cutello, V., Narzisi, G., Nicosia, G.: Computational studies of peptide and protein structure prediction problems via multiobjective evolutionary algorithms. In: Multiobjective Problem Solving from Nature. Natural Computing Series. vol. 14, pp. 93–114 (2008) 10. De Castro, L.N., Von Zuben, F.J.: Learning and optimization using the clonal selection principle. IEEE Trans. Evol. Comput. 6(3), 239–251 (2002) 11. Dunbrack, J.R.L., Cohen, F.E.: Bayesian statistical analysis of protein sidechain rotamer preferences. Protein Sci. 6, 1661–1681 (1997) 12. Fefelov, A.A., Lytvynenko, V.I., Taif, M.A., Voronenko, M.A.: Reconstruction of the s-system by a hybrid algorithm for clonal selection and differential evolution. Control Syst. Comput. 6, 41–51 (2017) 13. Fefelova, I., et al.: Protein Tertiary Structure Prediction with Hybrid Clonal Selection and Differential Evolution Algorithms. Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2019. Advances in Intelligent Systems and Computing, vol. 1020, pp. 673–688 (2020). https://doi.org/10.1007/978-3-03026474-1 47 14. Hansmann, U.H., Okamoto, Y.: Numerical comparisons of three recently proposed algorithms in the protein folding problem. J. Comput. Chem. 18, 920–933 (1997) 15. Hansmann, U.H.E., Okamoto, Y.: Numerical comparisons of three recently proposed algorithms in the protein folding problem. Comput. Chem. 18, 920–933 (1997) 16. Hoque, M., Chetty, M., Sattar, A.: Genetic algorithm in Ab initio protein structure prediction using low resolution model: a review. Stud. Comput. Intell. 224, 317–342 (2009). https://doi.org/10.1007/978-3-642-02193-0 14 17. Horia, J.H., Khaled, B.: Homology modeling: an overview of fundamentals and tools. Int. Rev. Model. Simul. 10, 129 (2017). https://doi.org/10.15866/iremos. v10i2.11412 18. Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12), 2577–2637 (1983). https://doi.org/10.1002/bip.360221211 19. Klepeis, J.L., Wei, Y., Hecht, M.H., Floudas, C.A.: Ab initio prediction of the three-dimensional structure of a de novo designed protein: a double-blind case study. Proteins 58(3), 560–570 (2005). https://doi.org/10.1002/prot.20338 20. Ma, J., Peng, J., Wang, S., Xu, J.: A conditional neural fields model for protein threading. Bioinformatics 28(12), 59–66 (2012). https://doi.org/10.1093/ bioinformatics/bts213 21. McLachlan, A.D.: Rapid Comparison of Protein Structres, Acta Cryst A38, 871– 873) as implemented in the program ProFit (Martin, A.C.R. and Porter, C.T. (1982). http://www.bioinf.org.uk/software/profit/
330
I. Fefelova et al.
22. Mirny, L.A., Finkelstein, A.V., Shakhnovich, E.I.: Statistical significance of protein structure prediction by threading. Proc. Natl. Acad. Sci. USA 97(18), 9978–9983 (2000). https://doi.org/10.1073/pnas.160271197 23. Peng, J., Xu, J.: RaptorX: exploiting structure information for protein alignment by statistical inference. Proteins 79(10), 161–171 (2011). https://doi.org/10.1002/ prot.23175 24. Pollastri, G., Przybylski, D., Rost, B., Baldi, P.: Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 47(2), 228–235 (2002) 25. Rackers, J.A., et al.: Tinker 8: software tools for molecular design. J. Chem. Theory Comput. 14(10), 5273–5289 (2018) 26. Storn, R., Price, K.: Minimizing the real function of the ICEC’96 contest by differential evolution. In: IEEE International Conference on Evolutionary Computation, pp. 842–844 (1996) 27. Storn, R., Price, K.: Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11(4), 341–359 (1997) 28. Tramontano, A., Morea, V.: Assessment of homology-based predictions in CASP5. Proteins 53(6), 352–368 (2003). https://doi.org/10.1002/prot.10543
Reduction of Training Samples in Solar Insolation Prediction Under Weather and Climatic Changes Yakiv Povod , Volodymyr Sherstjuk(B) , and Maryna Zharikova Kherson National Technical University, Kherson, Ukraine
Abstract. The paper considers the problem of forecasting solar insolation. Due to a large number of factors that are difficult to predict, this problem is complex and difficult like other problems where parameters depend on weather or climate. Despite such factors do not significantly affect the parameters under study, they create an essential bias. The problem is also characterized by a significant amount of information, which should be processed to obtain a reliable forecast. The paper discusses relevant machine learning methods and analyzes methods for reducing the training sample to make quality and reliable predictions of solar insolation under the weather and climatic changes. Reducing the size of the training sample allowed testing a significant number of models and optimizing the hyperparameters of these models, which made it possible to identify the most accurate model, which graph is close to ideal. The effect of reducing the size of the training sample has been measured related to the speed and accuracy of machine learning algorithms. It is shown that the use of clustered data sampling for models that prone to overfitting can improve the accuracy of these models. It is shown that using the method of clustering to reduce data samples can also reduce variance in the training dataset. The reduced data sample helps to perform optimization of hyperparameters because the accuracy of different models is preserved over a wide range of input data. The results obtained can be successfully applied in control systems for large objects of solar energy generation under conditions of fast and frequent weather changes as well as slow climatic changes. Keywords: Solar insolation · Machine learning samples · Optimizing the hyperparameters
1
· Reduction of training
Introduction
Solar insolation prediction is an important and complex task, as much as other weather or climate-dependent tasks [5]. Typically, most of the energy is used to create comfortable living conditions, including the microclimate. If solar insolation can be predicted over a certain time interval, taking into account meteorological data, it is possible to save energy by optimally controlling the solar energy c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 331–348, 2022. https://doi.org/10.1007/978-3-030-82014-5_22
332
Y. Povod et al.
generation system. Thus, solar insolation is the imported parameter investigated in many modern studies. The number of studies related to model predictive control is growing year by year, which makes it possible to use the optimal system control strategy. At the same time, the accuracy of hourly load prediction significantly affects the efficiency of model predictive control, but the load is directly influenced by the weather information for the next day; therefore, most models require weather forecast information [15]. Outside temperature and solar radiation are typical factors affecting the load. It is quite easy to forecast the outside air temperature due to small hourly changes during the day but forecasting the actual hourly values of solar insolation is a much more difficult task [14]. An extensive amount of hardly predictable factors introduces the essential problem of complexity even though most of them don’t make sufficient influence on the investigated parameter. However, they create a perceptible bias when data from certain geolocation are used to obtain predictions for different geolocations [18]. Clearly, the best results for solar insolation prediction can be obtained by utilizing data from the correspondent geolocations collected for at least one year [13]. Usually, this leads to a considerable amount of information, which must be processed to provide some forecasts [7].
2
Recent Works
There are models for predicting solar insolation based on physics or data [3]. Physical models build a correlation between solar insolation data and meteorological parameters measured in the past based on solar geometry, for example, based on the correlation between the sky cover and solar insolation data measured in the region for three years or, more accurately, based on weather data for 17 years [26]. There is also a physics-based model for predicting solar insolation using weather parameters such as humidity, wind, and precipitation, as well as data accumulated over the years [20]. However, physics-based weather forecasting models require long-term measurement data that are difficult to obtain from general weather forecast information to describe solar insolation in a certain location [25]. Therefore, such models cannot be applied to predict solar insolation for the next day. Physical model-based solar insolation models are useful to calculate total solar insolation monthly or annually rather than in real-time [6]. Another approach for solar insolation predicting is machine learning that excludes physical models, for example, deep learning, artificial neural networks, and other machine learning methods of various properties. Most of them provide relatively good results, but their accuracy, as well as the speed of the machine learning algorithms, significantly differ for distinctive data sets. On the other hand, usually, a neural prediction model exhibits higher accuracy than an empirical prediction model for physical solar insolation. Thus, the promising approach to predict solar insolation is to use machine learning on pre-measured statistical data [11]. The solar insolation prediction model developed from the training data includes various training methods at different time intervals. A model trained at
Reduction of Training Samples
333
15-minute intervals has been proposed in [22] as well as a next-day model based on the examined solar insolation data over the past six years using a feedforward neural network [23]. However, solar insolation prediction models can be effective only on the largescale data set because historical meteorological and solar insolation data, which are the input of the learning model, so they require too expensive equipment to measure data for a sufficient period in a certain area and collected them continuously. Thus, to allow the model predictive control, the existing prediction models for solar insolation should be improved. Therefore, a solar insolation prediction model has been proposed that uses only the weather forecast system easily available through the LSTM deep learning algorithm [19]. Thus, relatively small amounts of local data were used; however, it is still difficult to use the model without data measured in the corresponding region because it is essential to collect solar insolation data in the local region. Besides, there are several machine learning methods based on Linear Regressions [11], Decision Trees [8], Random Forests [16], Multilayer Perceptron [2,12]. A model [4] has been developed to determine the optimal combination of predictive input parameters through experiments involving combinations of weather parameters. The obtained results indicate that the accuracy and the speed of prediction can vary depending on the hyperparameters of machine learning algorithms.
3
Problem Statement
Previous solar insolation prediction models are quite difficult to use directly because their inputs are detailed data from a weather measurement center since they require data measured over a long period in a specific region. Clearly, in most cases, such local data cannot be accumulated. In order to be sure that chosen machine learning algorithm and its hyperparameters are optimal or at least close to optimal, we need to study several machine learning models in detail by training them on a certain data sample [9]. Therefore, in this study, we aimed to compare solar insolation prediction models based on different machine learning methods to achieve their optimal hyperparameters, which can provide good predictive performance. In this work, we are especially interested in the comparison of the machine learning methods based on Linear Regression, Decision Trees, Random Forests, Multilayer Perceptron. However, this could lead to excessive consumption of computational resources. Therefore, we should use a sample of data points for 219 weather stations, which has been obtained from NSRDB TMY3 data [1] and contains almost 2 million samples. Thus, our topic of interest is to examine how the mutual accuracy of the models can change when the size of the training data sample reducing.
334
4
Y. Povod et al.
Data Model
In this work, we use typical meteorological year data obtained from the NSRDB archive as input data points. The collection of data has been recorded to a set of CSV files. Each CSV file contains a header with information about the meteorological station and a sequence of comma-separated weather data. The header contains records about site identifier code, station name, station state, site time zone, site latitude, site longitude, and site elevation. The weather data contains 68 measurements. We use only 7 of them in the actual study, namely: – date - the date when data has been recorded; – time in a local standard format; – hourly extraterrestrial radiation normal to the sun (the amount of solar radiation, received on a surface normal to the sun at the top of the atmosphere during the previous 60 min); – direct normal irradiance (the amount of solar radiation, received in a collimated beam on a surface normal to the sun during the previous 60 min); – diffuse horizontal irradiance (the amount of solar radiation, received from the sky (excluding the solar disk) on a horizontal surface during the previous 60 min); – total sky cover (the amount of sky dome covered by clouds or obscuring phenomena at the indicated time; – opaque sky cover (the amount of sky dome covered by clouds or obscuring phenomena that prevent observing the sky or higher cloud layers at the indicated time). The sun elevation and azimuth have been calculated using date, time, and site coordinates. These data can also be used to calculate air mass, which highly correlates with direct and indirect solar insolation. The extraterrestrial radiation indicates the total amount of energy that reaches the earth’s atmosphere, so it is a valuable input parameter and can be easily calculated using distance to the sun. The direct and diffuse irradiance is also considered as input parameters, which can be possible used to calculate the amount of solar insolation on a certain tilted surface. The amount of cloud cover is also considered as an input parameter because it has the most influence on solar insolation compared to other hardly predictable values. Also, cloud cover information can be obtained aligned with some weather prediction resources. Each CSV file is classified by precision. According to the manual [27], the collected data has three precision classes. We use for our calculations only data samples having the first uncertainty class, which means that less than 25% of the data for the 15 years of record exceeds an uncertainty of 11%. In total, we obtain the data set that contains 1 918 438 samples after preprocessing. Each sample has the following 6 input parameters: sun elevation, area of
Reduction of Training Samples
335
opaque cloud cover, the total area of cloud cover, altitude, azimuth to the sun, solar radiation over the atmosphere, and the following 2 output parameters: direct solar radiation and diffuse solar radiation.
5
Examining the Models
In order to examine the models, the data has been divided into test and training sets. The size of the test sample is about 50% of the training sample. The well-known Sci-Kit-Learn Python module [24] has been used to train machine learning models. The different machine learning algorithms have been compared according to the following metrics: – – – – –
explained variance (EV), root mean squared error (RMSE), coefficient of determination (R2), training time (TL), testing time at 525.6 thousand points (TT).
The obtained results of the calculation are represented in Table 1. Table 1. Metrics of models trained on the full data set Model
EV
RMSE
R2
TL
TT
Linear regression
0.604855 122.723390 0.602216
0.2913 0.0255
Decision trees
0.715373 103.546219 0.715345
7.3831 0.0470
Random forest
0.850848
74.659053 0.850831 517.3851 3.4582
Multi-layer perceptron 0.862870
71.302290 0.862862 580.2552 0.3989
Clearly, there is a robust model to forecast solar insolation with a clear sky. In order to perform visual overfitting detection, we build graphs that represent the dependence of the solar insolation on elevation angle separately for the reference model (Fig. 1) and obtained Linear Regression model (Fig. 2), Decision Tree model (Fig. 3), Random Forest model (Fig. 4), and Multilayer Perceptron model (Fig. 5). Despite the good test results, decision tree-based algorithms suffer from overfitting and require significantly more computational resources to predict values compared to a multilayer perceptron as shown in Fig. 3. Moreover, the more accurate are algorithms the more computational resources they require for training. Obviously, the quality of the models can be improved by finding optimal hyperparameters, but it will be unjustified due to the extensive training times.
336
Y. Povod et al.
Fig. 1. Dependence of the solar insolation on elevation angle for the reference model
Fig. 2. Dependence of the solar insolation on elevation angle for the linear regression model
To overcome this problem, we propose to train the model using only a small random data sample [10]. Thereby, the learning process for the model can be carried out at a considerably higher speed with respect to a much smaller number of samples. This allows trying the search of hyperparameter values to optimize used algorithms. Based on the above-mentioned assumptions, we built the models using data samples collected for 46656, 3125, and 256 points.
Reduction of Training Samples
337
Fig. 3. Dependence of the solar insolation on elevation angle for the decision tree model
Fig. 4. Dependence of the solar insolation on elevation angle for the random forest model
338
Y. Povod et al.
Fig. 5. Dependence of the solar insolation on elevation angle for the multilayer perceptron model
Thus, we evaluated the prediction models on the specific training data sets, which have been used to test models based on a full test data set. The results of the evaluation are shown in Tables 2, 3 and 4. Table 2. Metrics of models built on a random data sample for 46656 points Model
EV
RMSE
R2
TL
TT
Linear regression
0.613236 129.251138 0.606083
0.0160 0.0330
Decision trees
0.744179 102.223407 0.742459
0.2182 0.0620
Random forest
0.863903
74.258195 0.862114 12.6658 2.4443
Multi-layer perceptron 0.889021
64.410215 0.888368 13.4437 0.5400
For greater clarity, we have built graphs for 3125 samples using optimized Random Forest (Fig. 6) and the optimized Multilayer Perceptron (Fig. 7) models, as well as graphs for 256 samples using the same optimized Random Forest (Fig. 8) and the optimized Multilayer Perceptron (Fig. 9) models. The graphs show that the use of random sampling for 3125 data points can significantly increase the speed of learning, at the same time it does not cause a significant impact on accuracy. In that way, the Random Forest algorithm trained at 256 selected samples (Fig. 7) produced a much better result than the linear regression trained on the complete data set (Fig. 2), while the training time of the random sample-based model is even shorter.
Reduction of Training Samples
339
Table 3. Metrics of models built on a random data sample for 3125 points Model
EV
Linear regression
0.506531 160.810834 0.428530
RMSE
R2
TL
0.0020 0.0290
TT
Decision trees
0.541105 146.565093 0.513256
0.0090 0.0390
Random forest
0.765249 105.027812 0.742700
0.5485 1.7627
Multi-layer perceptron
0.618361 144.094617 0.545933
1.8347 0.4434
Optimized random forest
0.765608 100.954488 0.742476 666.2534 0.8853
Optimized multi-layer perceptron 0.753188 108.411273 0.723433 279.2962 0.5335
Table 4. Metrics of models built on a random data sample for 256 points Model
EV
Linear regression
0.556933 163.800949 0.397879
RMSE
R2
0.0020 0.0290
Decision trees
0.460273 167.406410 0.359679
0.0020 0.0320
Random forest
0.635069 134.960461 0.567484
0.1211 0.8928
Multi-layer perceptron
0.064936 199.446766 0.015383
0.2112 0.4724
Optimized random forest
0.479590 155.780530 0.396233 33.0102 0.0681
Optimized multi-layer perceptron 0.168251 306.627783 0
TL
TT
89.2804 4.5692
The optimization of hyperparameters for the Random Forest algorithm in this specific case does not cause a significant effect on accuracy while optimizing the hyperparameters of the Multilayer Perceptron delivered a small but noticeable gain in accuracy (Figs. 7 and 9). Since the data sample is noisy, the gain in accuracy by increasing the size of the training sample can be caused by the fact that the model acquires more information for noise reduction. Further, we can split data points into clusters using the k-means method [21]. Thus, we can significantly reduce the number of points and smooth out the noise of the data sample by a search for the center points of the clusters. However, the use of the conventional k-means method for the considerable amount of data leads to utilizing the overmuch computational resource. We suppose that, in this case, the utilization of batch k-means centers of clusters [17] offers a significant increase in performance and should not significantly reduce the accuracy of the result. Therefore, the data samples containing 46656, 3125, 256 points have been additionally clustered. The time consumed on clustering in dependence on the number of points is shown in Table 5. According to Table 5, we can conclude that there is a nonlinear dependence of time on the number of clusters. This can affect the fact that there is an upper limit on the number of clusters, which is no longer appropriate to break the model since all necessary investigations can be carried out based on the complete data set in less time. The results of models that have been built using the clustered data samples are shown in Tables 6, 7 and 8.
340
Y. Povod et al.
Fig. 6. Random forest regression at 3125 points
Fig. 7. Multilayer perceptron at 3125 points Table 5. Clustering time against the number of samples Number of samples Clustering time, s 46656
3302.5406
3125
36.1358
256
68.4481
27
4.5222
Reduction of Training Samples
Fig. 8. Random forest regression by 256 points
Fig. 9. Multilayer perceptron at 256 points Table 6. Metrics of models built using clustered data sample for 46656 points Model
EV
RMSE
R2
Linear regression
0.473668 143.139884 0.468079
0.1431 0.0110
Decision trees
0.795105
0.3583 0.0240
Random forest
0.888400
62.387041 0.888188 20.8867 2.3597
Multi-layer perceptron 0.882012
64.045922 0.881921 14.9860 0.5085
84.530208 0.794874
TL
TT
341
342
Y. Povod et al. Table 7. Metrics of models built using clustered data sample for 3125 points
Model
EV
Linear regression
0.438037 147.990279 0.430587
0.0010
0.0110
Decision trees
0.784035
86.221525 0.783414
0.0160
0.0180
Random forest
0.875747
65.731223 0.875076
1.0379
1.3988
Multi-layer perceptron
0.714686 106.273439 0.712371
1.7472
0.4824
Optimized random Forest
0.883826
Optimized multi-layer perceptron 0.869060
RMSE
R2
TL
TT
66.707122 0.880922 792.0569 18.9931 70.376879 0.864272 355.7497
0.0711
Table 8. Metrics of models built using clustered data sample for 256 points Model
EV
RMSE
Linear regression
−0.07168
204.925593 −0.07915
R2
TL
TT
0.0010 0.0110
Decision trees
0.659837 121.595789
0.616992
0.0020 0.0140
Random forest
0.634256 130.912531
0.576954
0.1612 0.9789
Multi-layer perceptron
0.554766 129.653964
0.549569
0.1912 0.5215
Optimized random forest
0.590088 148.553129
0.507372 27.4366 3.2731
Optimized multi-layer perceptron
0.815641
0.813439 96.4049 0.1992
84.311298
For greater clarity, we have built graphs for 3125 data points using the Random Forest (Fig. 10) and the Multilayer Perceptron (Fig. 11) models as well as for 256 data points using the same Random Forest (Fig. 12) and the Multilayer Perceptron (Fig. 13) models. Since the data distribution becomes uneven [28], the linear models become more biased. At the same time, other models show a certain increase in accuracy, and their graphs represent much less noise. Clearly, the Decision Tree and Random Forest models are prone to overfitting, but due to the less amount of noise in clustered data, these algorithms can achieve better generalization than the algorithms trained on the full dataset. Additionally, when the data sample size is significantly reduced, the accuracy of all algorithms decreases. The possible reason is the “Curse of Dimensionality” [10]. According to the data presented in Tables 9 and 10, the accuracy of the models based on small data samples increases due to decreasing of dimensions of the input data sample. We can reasonably interpret this through the fact that models based on the Multilayer Perceptron suffer from the “Curse of Dimensionality” [10]. In order to verify this assumption, we perform the dimensionality reduction of the input data using the optimal combination of input parameters for a given number of features. The dependence for the coefficient of determination on the number of measurements and the number of points is shown in Tables 9 and 10.
Reduction of Training Samples
Fig. 10. Random forest regression at 3125 points
Fig. 11. Multilayer perceptron at 3125 points Table 9. The accuracy of the multilayer perceptron on a random data sample 6
5
4
3
2
1
46656
0.810624
0.887369
0.882440
0.886156 0.883539 0.642373
3125
0.811762
0.815706
0.834550
0.828208 0.834565 0.630194
256
0.568048
0.523668
0.594506
0.640980 0.635581 0.612262
27
0.067735
0.018385
0.021285
0.243897 0.543966 0.518505
4 −2.751225 −1.518638 −2.472216 −11.229447 0.540164 0.531449
343
344
Y. Povod et al.
Fig. 12. Random forest regression by 256 points
Fig. 13. Multilayer perceptron at 256 points
In addition to the choice of a model, the hyperparameters of the model demonstrate additionally an essential factor: although the optimization of hyperparameters for the Random Forest model did not produce a reliable result, the most chosen samples represent the best accuracy for the Multilayer Perceptron model based on the same parameters. Therefore, we decide to test these parameters when learning the model on a complete data set (Table 11) and built a corresponding graph of the optimized model.
Reduction of Training Samples
345
Table 10. The accuracy of the multilayer perceptron on the clustered data 6
5
4
3
2
1
46656 0.885234
0.883998 0.884393 0.880177 0.880290
0.639043
3125 0.850628
0.857905 0.844665 0.837346 0.873672
0.632749
256 0.341854
0.538805 0.772292 0.795018 0.741028
0.606975
27 0.203687
0.189355 0.592604 0.663000 0.439170
0.532177
4 0.009273 −0.146042 0.201778 0.389213 0.303835 −0.277406 Table 11. Comparison of regression with a multilayer perceptron using optimized parameters Model
EV
RMSE
R2
TL
TT
Not optimized model 0.862870 71.302290 0.862862 580.2552 0.3989 Optimized model
0.885884 62.420806 0.885669
25.8614 0.1306
A graph representing the dependence of the solar insolation on the elevation angle of the sun for the optimized Multilayer Perceptron is also shown for verification in Fig. 14. Based on the obtained results, we can conclude that the use of the optimized hyperparameters has slightly increased the accuracy of the prediction model, and the prediction model based on the optimized hyperparameters has the closest look to the reference model.
Fig. 14. Dependence of the solar insolation on the sun elevation angle for the Multilayer Perceptron with optimized hyperparameters
346
6
Y. Povod et al.
Conclusions
Reducing the size of the training sample makes it possible to test a significant number of prediction models and to optimize the hyperparameters of these models, which in turn enables a more accurate prediction model. The graph of the optimized prediction model demonstrates the most remarkable similarity to the ideal with respect to the reference model. If one can sacrifice a minor loss in the accuracy, it is possible to use a data sample for training the model, which reduces the training time by tens or even hundreds of times. The paper justifies that the use of the clustered data sampling for the prediction models, which prone to overfitting, leads to better accuracy of these models. If the amount of data is too large, so it is impossible to utilize complex prediction models to train the prediction model based on this set, the use of complex prediction models trained on the random data samples allows obtaining a better result than learning simpler models on a complete data set. The minimum sample size is limited by the size of the input data and the minimum number of points required by the algorithm. The additional use of the clustering technique to reduce the data samples for the data set typically generates better results than random sampling due to the reduction of variance in the training data set. We confirm that the hyperparameters’ optimization and the prediction model selection can be performed using a reduced data sample since the accuracy of various prediction models is maintained over a wide range of input data sizes. However, the accuracy of prediction can leak due to the “Curse of Dimensionality” effect caused by a significant reduction in the size of the input data, or by a significant number of features of the input data. Hence, the reduced data sample can help to perform optimization of hyperparameters because the accuracy of different models is preserved over a wide range of input data. The results obtained in the paper can be successfully applied in control systems for large objects of solar energy generation under conditions of fast and frequent weather changes as well as slow climatic changes.
References 1. Archives: NSRDB (n.d.). https://nsrdb.nrel.gov/data-sets/archives.html 2. Abirami, S., Chitra, P.: Energy-efficient edge based real-time healthcare support system. Adv. Comput. 117(1), 339–368 (2020). https://doi.org/10.1016/bs.adcom. 2019.09.007 3. Aggarwal, S.K., Saini, L.M.: Solar energy prediction using linear and non-linear regularization models: a study on AMS (American Meteorological Society) 2013– 14 Solar Energy Prediction Contest. Energy 78, 247–256 (2014). https://doi.org/ 10.1016/j.energy.2014.10.012 4. Ahmad, A., Anderson, T.N., Lie, T.T.: Hourly global solar irradiation forecasting for New Zealand. Solar Energy 122, 1398–1408 (2015). https://doi.org/10.1016/j. solener.2015.10.055
Reduction of Training Samples
347
5. Chung, M.H.: Estimating solar insolation and power generation of photovoltaic systems using previous day weather data. Adv. Civil Eng. 2020, 1–13 (2020). https://doi.org/10.1155/2020/8701368 6. de Araujo, J.M.S.: Performance comparison of solar radiation forecasting between WRF and LSTM in Gifu. Japan. Environ. Res. Commun. 2(4), 045002 (2020). https://doi.org/10.1088/2515-7620/ab7366 7. Diez, F.J., Navas-Gracia, L.M., Chico-Santamarta, L., Correa-Guimaraes, A., Mart´ınez-Rodr´ıguez, A.: Prediction of horizontal daily global solar irradiation using artificial neural networks (ANNs) in the castile and Le´ on region, Spain. Agronomy 10, 96 (2020). https://doi.org/10.3390/agronomy10010096 8. F¨ urnkranz, J.: Decision tree. In: Sammut C., Webb G.I. (eds) Encyclopedia of Machine Learning. Springer, Boston (2011). https://doi.org/10.1007/978-0-38730164-8 204 9. Ghojogh, B., Crowley, M.: Principal sample analysis for data reduction. In: IEEE International Conference on Big Knowledge (ICBK). pp. 350–357 (2018). https:// doi.org/10.1109/icbk.2018.00054 10. Gorban, A.N., Tyukin, I.Y.: Blessing of dimensionality: mathematical foundations of the statistical physics of data. Philos. Trans. R. Soc. Ser. A Math. Phys. Eng. Sci. 376(2118), 20170237 (2018). https://doi.org/10.1098/rsta.2017.0237 11. Harrell, F.E., Jr.: Regression Modeling Strategies. SSS, Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19425-7 12. Ingrassia, S., Morlini, I.: Neural network modeling for small datasets. Technometrics 47(3), 297–311 (2005). https://doi.org/10.1198/004017005000000058 13. Jeon, B.K., Kim, E.J.: Next-day prediction of hourly solar irradiance using local weather forecasts and LSTM trained with non-local data. Energies 13, 5258 (2020). https://doi.org/10.3390/en13205258 14. Jeon, B.K., Kim, E.J., Shin, Y., Lee, K.H.: Learning-based predictive building energy model using weather forecasts for optimal control of domestic energy systems. Sustainability 11, 147 (2019). https://doi.org/10.3390/su11010147 15. Khanmirza, E., Esmaeilzadeh, A., Markazi, A.H.D.: Predictive control of a building hybrid heating system for energy cost reduction. Appl. Soft Comput. 46, 407–423 (2016). https://doi.org/10.1016/j.asoc.2016.05.005 16. Pal, R.: Overview of predictive modeling based on genomic characterizations. In: Predictive Modeling of Drug Sensitivity, pp. 121–148 (2017). https://doi.org/10. 1016/B978-0-12-805274-7.00006-3 17. Pestov, V.: Is the k-NN classifier in high dimensions affected by the curse of dimensionality? Comput. Math. App. 65(10), 1427–1737 (2013). https://doi.org/ 10.1016/j.camwa.2012.09.011 18. Premalatha, N., Valan Arasu, A.: Prediction of solar radiation for solar systems by using ANN models with different back propagation algorithms. J. Appl. Res. Technol. 14(3), 206–214 (2016). https://doi.org/10.1016/j.jart.2016.05.001 19. Qing, X., Niu, Y.: Hourly day-ahead solar irradiance prediction using weather forecasts by LSTM. Energy 148, 461–468 (2018). https://doi.org/10.1016/j.energy. 2018.01.177 20. Samimi, J.: Estimation of height-dependent solar irradiation and application to the solar climate of Iran. Solar Energy 52, 401–409 (1994). https://doi.org/10.1016/ 0038-092X(94)90117-K 21. Sculley, D.: Web-scale k-means clustering. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1177–1178 (2010). https://doi.org/10.1145/ 1772690.1772862
348
Y. Povod et al.
22. Sharma, V., Yang, D., Walsh, W., Reindl, T.: Short term solar irradiance forecasting using a mixed wavelet neural network. Renew. Energy 90, 481–492 (2016). https://doi.org/10.1016/j.renene.2016.01.020 23. Srivastava, S., Lessmann, S.: A comparative study of LSTM neural networks in forecasting day-ahead global horizontal irradiance with satellite data. Solar Energy 162, 232–247 (2018). https://doi.org/10.1016/j.solener.2018.01.005 24. Varoquaux, G., Buitinck, L., Louppe, G., Grisel, O., Pedregosa, F., Mueller, A.: Scikit-learn. GetMobile. Mobile Comput. Commun. 19(1), 29–33 (2015). https:// doi.org/10.1145/2786984.2786995 25. Vindel, J.M., Polo, J., Zarzalejo, L.F.: Modeling monthly mean variation of the solar global irradiation. J. Atmos. Solar-Terr. Phys. 122, 108–118 (2015). https:// doi.org/10.1016/j.jastp.2014.11.008 26. Wang, F., Mi, Z., Su, S., Zhao, H.: Short-term solar irradiance forecasting model based on artificial neural network using statistical feature parameters. Energies 5, 1355–1370 (2012). https://doi.org/10.3390/en5051355 27. Wilcox, S., Marion, W.: User’s manual for TMY3 data sets (revised) (2008). https://doi.org/10.2172/928611 28. Zollanvari, A., James, A.P., Sameni, R.: A theoretical analysis of the peaking phenomenon in classification. J. Classif. 37(2), 421–434 (2019). https://doi.org/10. 1007/s00357-019-09327-3
Research of Acoustic Signals Digital Processing Methods Application Efficiency for the Electromechanical System Functional Diagnostics Hanna Rudakova1 , Oksana Polyvoda1(B) , Inna Kondratieva1 , Vladyslav Polyvoda2 , Antonina Rudakova3 , and Yuriy Rozov1 1
3
Kherson National Technical University, Kherson, Ukraine {pov81,inna2017ukr}@ukr.net, [email protected] 2 Kherson State Maritime Academy, Kherson, Ukraine National Technical University of Ukraine Igor Sikorsky Kyiv Polytechnic Institute, Kyiv, Ukraine Abstract. The article is devoted to researching of application efficiency of acoustic signals digital processing methods for the electromechanical systems (EMS) functional diagnostics in real time. The steps according to which it is necessary to carry out the analysis of the acoustic signals generated during operation of EMS for construction and adjustment of functional diagnostics systems are offered. The primary task is to study the statistical properties of acoustic signals. The next step is spectral analysis of acoustic signals. In the process of acoustic signal conversion after its hardware processing, the limits of the informative part of the signal are determined on the basis of spectral analysis based on Fourier transform of the signal. It is proposed to use logic-time processing based on the estimation of the normalized energy spectrum and spectral entropy to determine the frequency range of the informative part of the signal. The expediency of using the autoregressive model of the moving average as an acoustic signal model is substantiated. The problem of building mathematical models on the basis of which it is possible to adequately identify the intensity of work and the state of the equipment is solved. The procedure for identifying the parameters of the signal model by the recurrent least squares method is given, which allows to analyze the state of the equipment in real time. It is proposed to use multiplescale analysis to speed up the process of signal analysis and reduce the amount of calculations. Keywords: Functional diagnostics · Acoustic signals · Time series Auto regression model of moving average · Recurrent least squares method · Multi-scale analysis
1
·
Introduction
Modern complex EMS consist of subsystems that perform certain technological functions and are interconnected by processes of intensive dynamic interaction, c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 349–366, 2022. https://doi.org/10.1007/978-3-030-82014-5_23
350
H. Rudakova et al.
exchange of energy and information. These systems are nonlinear, multidimensional, and multiconnected, with complex transients with critical and chaotic modes. When developing control systems for modern industrial equipment, it is advisable to introduce automated systems of functional diagnostics in them [18]. Improving the productivity and accuracy of modern electrical equipment requires ensuring the reliability and safety of its operation, as a consequence of improving the accuracy and speed of technical diagnostics in real time [6]. Modern computer technology allows to improve the technology of checking the EMS parameters by automating the measurement processes and the use of diagnostic software. Measurement and analysis of signals in systems of vibroacoustic diagnostics of electromechanical systems are most often made by means of the devices adapted for work in industrial conditions. At the same time, these operations can be performed using a computer, the input of which is the measurement of sensors that convert signals into digital form. An acoustic signal generated by electromechanical components during operation can be used to obtain operational information about the object under study. Formation of the procedure of analysis of acoustic signals generated by the operating equipment of EMS on the basis of use of modern methods of digital processing of time series in real time, allows to expand possibilities of monitoring and control of work of difficult technical systems [8].
2
Problem Statement
An actual problem is to ensure the normal operation of technological equipment in different conditions. Existing systems in many cases are functionally and informationally redundant, so one of the tasks is the rational use of modern EMS and software packages for design and prognosis [4]. We need to move from the technology of operational express analysis of the current situation to fundamental research methods based on a powerful apparatus for digital processing of acoustic signals. Timely detection of equipment defects and prognosis its development will not only reduce the number of failures, but also eliminate existing faults during scheduled maintenance, reduce the volume and timing of repairs due to their proper planning and organization. In this regard, the actual scientific and technical task is to develop methods for creating automated control systems for functional diagnostics of technological equipment based on an integrated approach, which will significantly reduce the number of accidents at work.
3
Literature Review
Functional diagnostics is a branch of scientific and technical knowledge, the essence of which consists of theory, methods and tools of finding and detecting defects of technical objects in real time[3].
Research of Acoustic Signals Digital Processing Methods
351
Functional diagnostic systems have the following features [13]: – in most cases, diagnostic signals are received using stationary installed systems. Assessment of the state of the equipment at the time of receiving the signal is performed on the value of the measured parameter without determining the reasons for its change; – systems do not take into account the duration of malfunctions, the measurement period is set by the user or developer, which does not allow to assess the occurrence of malfunctions in time; – creation of diagnostic systems is carried out on the basis of the list of parameters which are already controlled, or there is a possibility to control. Redundancy leads to the creation of large and inoperable systems. In the used practice of technical control the diagnostics of quality of functioning of EMS, which is developed on the basis of diagnostic models and on the basis of diagnostic criteria which are chosen for an estimation of a technical condition of system, acquires essential value. The disadvantage of existing methods for diagnosing the technical condition of EMS (organoleptic, radiation, thermographic, infrared, electrical, magnetic) is that there is a recognition of existing defects, without taking into account the analysis of the conditions of their occurrence [1]. Recently, there has been a tendency to create methods and techniques for diagnosing the technical condition of EMS based on the study of oscillatory (vibration) and acoustic processes. The parameters of oscillatory processes are most sensitive to various deviations of the parameters of the technical condition of mechanical components from the norm. The effectiveness of vibration diagnostics is due not only to the organic connection of information contained in the vibration signal with the dynamic processes of excitation and propagation of oscillations in the structure, but also the ability to automate the process of obtaining and processing information using modern technology and diagnostic procedures based on mathematical theory of pattern recognition. The aim of the research is to analyze the effectiveness of models and methods of digital acoustic signal processing generated by operating technological equipment in EMS functional diagnostic systems. Research objectives: 1. Development of methods for identification of critical modes of operation of technological systems by processing acoustic signals generated by operating equipment. 2. Approbation of the developed methods for the automated control system of functional diagnostics of technological equipment by computer modeling.
4
Materials and Methods
Characteristic noise signals of electromechanical equipment have periodic and non-periodic components. The parameters of noise signals change over time -
352
H. Rudakova et al.
in none-defective equipment slowly, and in equipment approaching the state of destruction, very quickly. It is considered that during the life cycle of the equipment, its characteristic noise signal is an interval-stationary process, provided that the observation intervals are selected for each type of equipment, and the signals are considered as the implementation of a random process with normal distribution. To increase the efficiency of EMS functional diagnostic systems, it is possible to apply digital signal processing methods in the following sequence. 1. Analysis of statistical properties of acoustic signals. In the statistical analysis of diagnostic parameters in practical tasks it is necessary to fulfill a set of conditions. The first condition is the optimal choice of intervals and the minimum number of measurements. As a rule, a reliable estimate of the vibration parameters of machines and equipment can be obtained only for a long time of defect-free operation, with changes in operating modes within the permitted limits and under different external conditions. The second condition is related to the possible occurrence of random incorrect measurements in the sample, which is used for statistical analysis. The third condition is the need to check the data of periodic measurements of diagnostic parameters to identify their monotonous changes (trends) that may occur after repair (maintenance) of equipment during wear or development of possible defects. The fourth condition is the need to narrow the zone of natural fluctuations of diagnostic parameters in the absence of defects in the equipment under control. The normal law of distribution of statistics data of loading allows to use them for construction of models of the prognosis of a condition of the electromechanical equipment. To determine the type of the law of distribution of a random variable, we can use the Kolmogorov statistic test. To test the hypothesis about the distribution density of the normal type, it is necessary to use Pearson’s χ2 test [17]: χ = 2
n (mi − N pi )2 i=1
N pi
(1)
where N is total number of observations, m i – the number of observations of type i ; pi – the expected (theoretical) count of type i or the probability that the value of statistics data is within the i -th category, n – number of statistics data of type i. 2 i The density of data in any interval has the form p∗i = m N . The χ -distribution depends on the number of degrees of freedom r, is equal to the number of statistics data of specific type n, minus the reduction in degrees of freedom, or the number of independent conditions (connections) that are superimposed on the frequencies, p∗i . 2. Spectral analysis of acoustic signals. Noises of devices and machines characterize both the general properties of the system and the properties of their
Research of Acoustic Signals Digital Processing Methods
353
parts. Experience with the use of acoustic methods shows that in the state of normal operation the energy of noise is mainly concentrated in the low frequency range, and the energy of defects is located at higher frequencies [7]. In the process of acoustic signal transform after its hardware processing, it is expedient to determine the boundaries of the informative part of the signal on the basis of spectral analysis, the essence of which is based on the primary Fourier transform of the signal. 3. Logic-time signal processing (LTSP). An important task is to form a procedure for determining the frequency range of the informative part of the signal based on the estimation of the normalized energy spectrum and spectral entropy. The sequence of stages of determining the boundaries of the informative part of the signal based on spectral entropy [5] is shown in Fig. 1.
Fig. 1. Stages of determining the boundaries of the informative part of the signal.
For each frame (segment s of length ΔN ), the spectrum is calculated Ys (k) =
ΔN −1
ys (n)e−j(2π/ΔN )nk , k ∈ 0, ΔN − 1, s ∈ 1, N/ΔN
(2)
n=0
The normalized energy spectrum is calculated by the formula Ws (k) 2 Ws (k) = ΔN −1 , where Ws (k) = |Ys (k)| m=0 Ws (m)
(3)
For suppressing narrow-band noise and wide-band white noise following the next rule is used: Ws (k) = 0, if δ1 < Ws (k) < δ2
(4)
The spectral entropy is calculated: HS =
ΔN −1
Ws (k) lg Ws (k)
(5)
k=0
Filtering is performed according to the algorithm of median smoothing of the sequence H1 , ..., HL , L = N/ΔN and obtaining the sequence of entropy
354
H. Rudakova et al.
˜ 1 , ..., H ˜ L is executed. The adaptive threshold in the following form estimates H is calculated ˜S ˜ S + min H (6) γ = 0.5μ max H ˜ S then the signal where μ is a parameter determined experimentally. If γ > H segment is considered informative. 4. Acoustic signal filtering. Filtering procedures are traditionally used to obtain the specific frequency range. Channel frequency bands are selected so that the number of “noise” components (signals that do not carry useful information) would be as small as possible. For filtering an acoustic signal in the lowfrequency range, the Butterworth low-pass filter (LPF) can be used, which has a frequency response as flat as possible in the passband [11], with the transfer function: (7) WB (p) = 1/Bn (p) where Bn (p) is the n-order Butterworth polynomial. Increasing the filtration range fc of the normalized LPF is achieved by substitution in W (p) p p→ (8) fc To convert a normalized LPF to a band-pass filter (BPF) [12] with lower f1 and upper f2 frequency range and bandwidth Δf = f2 − f1 , with an average √ frequency f0 = f1 f2 , it is necessary to substitute in the normalized LPF transfer function W (p) p2 + f0 2 (9) p→ pΔf 5. Analysis of the acoustic signal in phase space. An additional method of analysis of acoustic signals to implement a comprehensive approach to solving problems of functional diagnostics is the construction of phase portraits that characterize the dynamics of systems directly on the amplitude-time characteristics of the acoustic signal. Each instant state of the system corresponds to a phase point in the phase space x, x, ˙ x ¨, . . . , x(n) , and each point of space corresponds to a definite and unique state of the system. The dynamics of the system can be represented as a sequential change in the position of phase points, i.e. the trajectory of these points in the phase space [16]. 6. Modeling of acoustic signals of electrical equipment by the autoregressive moving-average model (ARMA). As a rule, stationary random processes and stationary changes of time series parameters correspond to the normal state or mode of operation of controlled objects. In case of violation of the modes of operation of the controlled equipment in the recorded acoustic signals, there are manifestations of non-stationary. For the analysis of non-stationary discrete signals as a model often choose an ARMA within a moving window of the form [10] n ai y [k − i] (10) y [k] = a0 + i=1
Research of Acoustic Signals Digital Processing Methods
355
the coefficients of which are determined by measurements by the recurrent least squares method by the following dependence A [k + 1] = A [k] + γ [k] · f (Γ [k] , y [k + 1] , ε [k + 1])
(11)
where the new value of the estimate A [k + 1] is determined by the current value of the parameters A [k], which is adjusted by a certain function f (Γ[k], y[k +1], ε[k +1]) from the input signals y [k + 1] using the new values of the gain γ [k], which is calculated at each step as −1 T γ [k] = 1 + yw [k] · Γ [k − 1] · yw [k] (12) Vector deviation of object and model states T ε [k + 1] = y [k + 1] − yM [k + 1] , yM [k + 1] = yw [k] · A [k]
(13)
where yM [k + 1] – values calculated based on the evaluation of the parameters calculated in the previous step A [k] and the input signals within the sliding T T [k] = y [k] y [k − 1] . . . y [k − i + 1] window yw The matrix Γ [k + 1] is defined as T Γ [k + 1] = Γ [k] − Γ [k] · yw [k + 1] · γ [k] · yw [k + 1] · Γ [k]
(14)
To solve the problems of identification of dynamic object in real time in finding the model parameters from all common methods should be used recursive least squares method [9,14]. To implement the method, at first it is necessary to determine the initial estimate of the vector of model parameters A [m] based on a sample of observations of length m, which can be found from the equation (15) Ym = Zm · A [m] T where Ym = yi+1 yi+2 . . . ym , and the matrix Zm is formed from the elements of the sample using sliding window technology as follows [2] ⎡ ⎤ yi . . . y2 y1 ⎢ yi+1 . . . y3 y2 ⎥ ⎥ (16) Zm = ⎢ ⎣ ... ... ... ... ⎦ ym−1 . . . ym−i+1 ym−i To find the initial estimate A [m], the traditional expression according to −1 the method of least squares A [m] = ZT · ZT m · Zm m can be used. The condition for completion of the identification step is |ai [k] − ai [k − 1]| < δ, for all values of i. 7. Multiple-scale analysis. To speed up the signal analysis process, it is necessary to eliminate redundancy, which can be achieved by aggregating data [15]. Aggregation that allows you to compress the time scale (reduce the amount of data) is carried out by averaging the nearest values as follows (g)
yk =
1 g
k·g i=k·g−(g−1)
yi
(17)
356
H. Rudakova et al.
where y is the input signal, y (g) is the aggregate signal with the degree of aggregation g. To determine the acceptable degree of aggregation, it is advisable to use multiple-scale analysis. The measure of the duration of the long-term dependence of the stochastic process is characterized by the Hurst parameter Hr. The value Hr = 0, 5 indicates the absence of long-term dependence. The closer the value Hr to 1, the higher the degree of stability of long-term dependence, i.e. it is necessary that the condition 0, 5 ≤ Hr ≤ 1 is met. The Hurst parameter is defined as Hr = 1 − (β/2), where β = log[V ar(y)/V ar(y (g) )]/ log(g), 0 < β < 1, V ar(y) and V ar(y (g) ) are the variances of the original and aggregated processes, respectively. The effectiveness of this procedure can be confirmed by processing real signals.
5
Experiment, Results and Discussion
The analysis of the acoustic signal was carried out according to the procedure proposed above. The experiments were performed with working electromechanical equipment in different modes of operation (with different rotation speed ω) and with varying load degrees q. Acoustic signals were saved using sound recording equipment in the *.wav format, with a sampling frequency of 44 kHz and 16-bit depth. 1. Analysis of statistical properties of acoustic signals. Statistical analysis revealed a normal distribution law of all recorded time series according to the Kolmogorov and Pearson consistency criteria. Calculations of numerical characteristics of the obtained random processes with a normal distribution law are completely given by the mathematical expectation my and standard deviation σy . As a result of calculations, the value of the mathematical expectation was obtained my = 0, which indicates the unbiasedness of the series, and the standard deviation σy characterizes the shape of the distribution curve. The calculated values of the standard deviation for various modes and operating conditions for all signals are shown in Table 1. In Fig. 2 shows the dependence of the standard deviation on the load degree in different modes of operation. In Fig. 2 shows that with increasing load, the standard deviation decreases. However, at a load of q = 40%, a jump is observed, which corresponds to the violation of the normal mode of operation of electromechanical equipment, which can even be visually observed in the graphical representation of the recorded signal. This acoustic signal, which corresponds to the mode of operation of the equipment with the violation, recorded at a load of q = 40% and a speed of 1000 rpm, is shown in Fig. 3. It changes in frequency over time, so over time we can identify a number of characteristic zones: 1 – the initial zone, which corresponds to the acceleration of the engine to rated speed; 2 – zone corresponding to the normal mode of operation of the engine; 3 – zone with additional disturbances that characterize the presence of violations in the engine.
Research of Acoustic Signals Digital Processing Methods
357
Table 1. Standard deviation of acoustic signals. Load degree q, % Equipment operation mode ω, rpm 500
1000 1500
No-load
0.025 0.022 0.023
+10%
0.017 0.019 0.018
+20%
0.016 0.016 0.016
+30%
0.015 0.013 0.014
+40%
0.022 0.024 0.016
+50%
0.013 0.014 0.015
Fig. 2. Dependence of standard deviation on the load degree in different modes of operation: 1 – for ω = 500 rpm; 2 – for ω = 1000 rpm; 3 – for ω = 1500 rpm.
2. Spectral analysis of acoustic signals. To analyze the change in the properties of the signal frequency in normal and faulty operation modes of the acoustic signal shown in Fig. 3, fragments from zone 2 (Fig. 4a) and zone 3 (Fig. 4b) are cut out, respectively. The result of applying the Fourier transform for signal fragments from zones 2 and 3 is shown in Fig. 5. The spectral diagram of the signal from zone 3 (in the mode of operation of equipment with violations) shows the presence of powerful high-frequency oscillations. It is advisable to analyze the signal properties for certain frequency bands separately. Detection of frequency intervals is possible through the use of logic-time processing. 3. Logic-time signal processing. The purpose of logic-time processing is to estimate the informativeness of the signal in some frequency bands. When determining the frequency bands of informativeness by conducting a series of calculations, it was found appropriate to use the value of the frame length ΔN = 1024. A separate task is to adjust the μ parameter. For both cases, the effective value of the parameter μ = 0.95 and threshold value γ2 = 1.748 and γ3 = 1.845 are obtained. The calculation results of spectral entropy Hi (f ) and an indicator of informativeness Pi (f ) by (2)–(6) for acoustic signals from
358
H. Rudakova et al.
Fig. 3. Acoustic signal obtained as a result of the experiment.
Fig. 4. Fragments of the acoustic signal.
Fig. 5. Acoustic signal spectra.
Research of Acoustic Signals Digital Processing Methods
359
zones 2 and 3 are shown in Figs. 6 and 7. Analysis of the results of experimental acoustic signal processing showed the existence of informative features for the signal from zone 2 in the frequency band 0. . . 3 kHz, and for the signal from zone 3 - in two frequency bands: 0. . . 3 kHz and 6. . . 12 kHz. In other bands by analysis of the entropy spectrum, the components of the random signal are closer in properties to noise, which indicates the inexpediency of their further use for analysis.
Fig. 6. Results of LTSP calculations for the signal from zone 2.
4. Acoustic signal filtering. According to the results of logic-time processing, the need for analysis of signals in two frequency bands, which can be obtained by filtering, was revealed. For this purpose it is necessary to use LPF and BF, i.e. signals must be passed through LPF with cutoff frequency fc = 3 kHz and bandpass filter with a frequency range from f1 = 6 kHz to f2 = 12 kHz. To implement filters according to (7)–(9) a 5th order Butterworth filter is used B5 (p) = p5 + 3.236068p4 + 5.236068p3 + 5.236068p2 + 3.236068p + 1 (18) The transfer function of the 5th order LPF is obtained in the form 4 −10 3 Wl (p) = 1/ 4.115 · 10−18 p5 + 3.995 · 10−14 p p + 1.94 · 10 −7 2 −3 +5.818 · 10 p + 1.079 · 10 p + 1 .
(19)
360
H. Rudakova et al.
Fig. 7. Results of LTSP calculations for the signal from zone 3.
For BF the bandwidth Δf = 6 kHz and average frequency f0 = 8.5 kHz was found. As a result of substitution of these values in (7) and (9) the transfer function of the filter in the form is received Wb (p) = p5 / 1.286 · 10−19 p10 + 2.497 · 10−15 p9 + 7.054 · 10−11 p8 +8.646 · 10−7 p7 + 0.0124p6 + 99.61p5 + 8.958 · 105 p4 + (20) +4.482 · 109 p3 + 2.633 · 1013 p2 + 6.71 · 1016 p + 2.49 · 1020 . The spectra of signal amplitudes after LPF and BF processing are shown in Table 2. 5. Analysis of the acoustic signal in phase space. The fragments of the input signals and the signals obtained by filtering with a frequency of 0–3 kHz and 6–12 kHz for the areas of the recorded signals from zone 2 and zone 3 are shown in Table 3. When analyzing the behavior of signals in phase space, we can identify certain properties. Table 4 shows the dynamics of the signals in the phase space. The coordinates xi correspond to the values of the registered acoustic signal yi , zi – the value of the increments of the signal of the first and second order, respectively, which corresponds to the first and second derivative of the signal. Even visually, there is an elimination of chaos in the behavior of the signal as a result of its decomposition over frequency bands, which simplifies the analysis and leads to the construction of more adequate models. 6. Modeling of acoustic signals of electrical equipment by the method of autoregression of the moving average. When constructing models of type (10), an analysis was performed to determine the order (depth) of the window. Previous studies have found that we can limit ourselves to the order of m = 4, so
Research of Acoustic Signals Digital Processing Methods Table 2. Signal spectra. Freq.
Zone 2
Zone 3
0-3 kHz
6-12 kHz
Table 3. The results of acoustic signal filtering. Filter
LPF
BF
Zone 2
Zone 3
361
362
H. Rudakova et al. Table 4. Dynamics of signals in phase space.
Freq.
Zone 2
Zone 3
No filter
0-3 kHz
6-12 kHz
the width of the sliding window was taken as i = 4. For zone 2, the equation of the model is obtained in the form y2l [k] = 4.85y[k − 1] − 9.49y[k − 2] + 9.39y[k − 3] − 4.69y[k − 4], y2b [k] = 4.61y[k − 1] − 8.78y[k − 2] + 8.63y[k − 3] − 4.38y[k − 4],
(21)
Research of Acoustic Signals Digital Processing Methods
363
and for zone 3 – in the following form y3l [k] = 4.74y[k − 1] − 9.07y[k − 2] + 8.77y[k − 3] − 4.28y[k − 4], y3b [k] = 4.6y[k − 1] − 8.74y[k − 2] + 8.57y[k − 3] − 4.33y[k − 4].
(22)
Similarly, we can obtain models of the form (10) for other signals. Construction of models for acoustic signals recorded during operation of electromechanical equipment with different speeds and loads, presented in Table 1, showed a significant dependence of the model coefficients on changes in operating conditions, which opens up prospects for identifying of equipment operating modes and loads in real time. 7. Multiple-scale analysis. Multi-scale analysis was performed to evaluate the experimentally obtained signal. The results of the application of multi-scale analysis (calculation of variance V ar, self-similarity parameter β, Hurst parameter Hr) with the degree of aggregation are shown in Table 5. A graphical representation of the dependence of self-similarity and Hurst parameters on the degree of aggregation is shown in Fig. 8. Taking into account the conditions of self-similarity (0, 5 ≤ Hr ≤ 1, 0 < β < 1), it can be noted that the limiting degree of aggregation for the studied signal is m = 30. Thus, it is possible to reduce the size of the initial sample, i.e. to compress the size of the input data for calculations by 30 times, which will reduce the time for processing and analysis of signals. Table 5. The results of acoustic signal filtering. 5 5
10
15
20
25
30
35
40
45
50
V ar · 10
82.9 30.4 16.2 9.91 6.5
β
0.23 0.59 0.74 0.83 0.91 0.94 1.08 1.16 1.24 1.34
Hr
0.88 0.7
4.85 2.56 1.68 1.07 0.63
0.63 0.58 0.55 0.53 0.46 0.42 0.38 0.33
Fig. 8. Dependence of multi-scale analysis parameters on the degree of aggregation.
364
6
H. Rudakova et al.
Conclusions
In this study, computer simulations of acoustic signal processing methods were performed. A series of experiments with electromechanical equipment working in different modes of operation (with different speeds) and with different load degrees. The analysis of statistical properties of acoustic signals is carried out, which revealed the normal law of distribution of the registered time series and the dependence of the numerical characteristics of the process on the operating modes and the equipment load degree. The analysis of change of properties on frequency of an acoustic signal at normal and faulty operating modes is executed. In case of violation of the normal mode of operation of electromechanical equipment, the dispersion of the acoustic signal increases. The spectral diagram of the signal in the mode of operation of equipment with violations shows the presence of powerful high-frequency oscillations. By means of logic-time processing the informativeness of the signal in some frequency ranges is estimated. When determining the frequency ranges of informativeness by performing a series of calculations, the effective value of the frame length and the threshold setting parameter was revealed. The signal components in certain frequency ranges are obtained by filtering with a low-pass filter and a bandpass filter, for the implementation of which a 5th order Butterworth filter is used. The analysis of the original acoustic signal and its components in the phase space is carried out. When creating ARMA, an analysis was performed to determine the order (depth) of the window. Research has shown that it is possible to limit the 4th order of models and, accordingly, the width of the sliding window. The analysis of the obtained models for acoustic signals recorded during operation of electromechanical equipment with different speeds and load degrees showed a significant dependence of the model coefficients on changes in operating conditions, which opens up prospects for identification of stress modes in equipment. The results of the application of multiple-scale analysis (calculation of variance, self-similarity parameter, Hirst parameter) depending on the degree of aggregation allowed to determine the limiting degree of aggregation for the studied signal. The integration of functional diagnostic systems based on the analysis of acoustic signals in the control systems of complex multi-drive systems opens the possibility of real-time assessment and detection of critical modes of electromechanical equipment, and timely form control actions to stabilize the operation of production facilities.
Research of Acoustic Signals Digital Processing Methods
365
References 1. Altaf, M., et al.: Automatic and efficient fault detection in rotating machinery using sound signals. Acoust. Australia 47(2), 125–139 (2019). https://doi.org/10. 1007/s40857-019-00153-6 2. Baklan, I.: Matematychnyi aparat intelektualnoho analizu danykh dlia prohnozuvannia neliniinykh nestatsionarnykh protsesiv mathematical apparatus of data mining for forecasting. Reiestratsiia, zberihannia i obrobka danykh 19(1), 9–21 (2017). https://doi.org/10.35681/1560-9189.2017.19.1.126488 3. Irfan, M.: Advanced Condition Monitoring and Fault Diagnosis of Electric Machines, p. 307. Najran University, Saudi Arabia (2018). https://doi.org/10. 4018/978-1-5225-6989-3 4. Grebenik, J., Bingham, C., Srinivasa, S.: Acoustic diagnostics of electrical origin fault modes with readily available consumer-grade sensors. IET Electric Power App. 13(12), 1946–1953 (2019). https://doi.org/10.1049/iet-epa.2019.0232 5. Kondratieva, I., Rudakova, H., Polyvoda, O.: Using acoustic methods for monitoring the operating modes of the electric drive in mobile objects. In: 2018 IEEE 5th International Conference, Methods and Systems of Navigation and Motion Control (MSNMC), pp. 218–221 (2018). https://doi.org/10.1109/MSNMC.2018.8576296 6. Kostyukov, V., Naumenko, A.: Vibroakusticheskaya diagnostika kak osnova monitoringa tekhnicheskogo sostoyaniya mashin i mekhanizmov [vibroacoustic diagnostics as the basis of machinery’s health monitoring]. V mire nerazrushayushchego kontrolya 20(3), 4–10 (2017). https://doi.org/10.12737/article 5992d69c02e679. 92866971 7. Lathi, B., Green, R.: Essentials of Digital Signal Processing. Cambridge University Press, Cambridge (2014). https://doi.org/10.1017/CBO9781107444454 8. Liu, Y., Barford, L., Bhattacharyya, S.S.: Optimized implementation of digital signal processing applications with gapless data acquisition. EURASIP J. Adv. Signal Process. 2019(1), 1–13 (2019). https://doi.org/10.1186/s13634-019-0615-7 9. Ma, J., Ding, F.: Recursive relations of the cost functions for the least-squares algorithms for multivariable systems. Circuits Syst. Signal Process. 32, 83–101 (2013). https://doi.org/10.1007/s00034-012-9448-4 10. Mikhalev, A., Vinokurova, Y., Sotnik, S.: Komp’yuternyye metody intellektual’noy obrabotki dannykh [Computer methods of intelligent data processing], p. 209. Sistemnyye tekhnologii, Dnepropetrovsk (2014) 11. Podder, P., Hasan, M.M., Islam, M.R., Sayeed, M.: Design and implementation of Butterworth, Chebyshev-I and elliptic filter for speech signal analysis. Int. J. Comput. App. 98(7), 12–18 (2014). https://doi.org/10.5120/17195-7390 12. Polyvoda, O., Rudakova, H., Kondratieva, I., Rozov, Y., Lebedenko, Y.: Digital acoustic signal processing methods for diagnosing electromechanical systems. Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2019. Advances in Intelligent Systems and Computing, vol. 1020, pp. 12–18 (2020). https://doi.org/10.1007/978-3-030-26474-1 7 13. Rani, M., Dhok, S., Deshmukh, R.: A machine condition monitoring framework using compressed signal processing. Sensors 20, 319 (2020). https://doi.org/10. 3390/s20010319 14. Rudakova, H., Polyvoda, O., Omelchuk, A.: Using recurrent procedures in adaptive control system for identify the model parameters of the moving vessel on the cross slipway. Data 3(4), 60 (2018). https://doi.org/10.3390/data3040060
366
H. Rudakova et al.
15. Stallings, W.: Foundations of Modern Networking, p. 696. Pearson Education Inc., London (2016) 16. Tohyama, M.: Acoustic Signals and Hearing: A Time-Envelope and Phase Spectral Approach. Academic Press, Cambridge (2020). https://doi.org/10.1016/B978-012-816391-7.00008-5 17. Wuensch, K.: Chi-Square Tests. International Encyclopedia of Statistical Science. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-04898-2 173 18. Zozulia, V.e.: Klasyfikatsiia zavdan i pryntsypiv upravlinnia mekhanizmom paralelnoi kinematychnoi struktury dlia vyrishennia riznykh zavdan [classification of problems and principles of control of the mechanism of parallel kinematic structure for the decision of various problems]. Automation of Technological and Business Processes 10(2), 18–29 (2018). https://doi.org/10.15673/atbp.v10i2.973
Computer Simulation of Physical Processes Using Euler-Cromer Method Tatiana Goncharenko , Yuri Ivashina , and Nataliya Golovko(B) Kherson State University, Kherson, Ukraine {TGoncharenko,YuIvashyna,NGolovko}@ksu.ks.ua
Abstract. The paper presents the results of the research concerning the application of computer simulation techniques to analyze physical processes based on the use of Euler-Cromer method. The theoretical part of the paper contains the stepwise procedure of the Euler-Cromer algorithm application. In the experimental part, we have presented the results of the proposed technique implementation for both solving and obtained results analysis using various types of charts. The simulation process was performed based on the use of R software. To our best mind, the implementation of the proposed technique in the learning process can allow the students to better understand the studied physical process on the one hand and obtain skills concerning the application of computer simulation techniques to complex processes analysis on the other one.
Keywords: Computer simulation processes analysis
1
· Euler-Cromer method · Physical
Introduction
The application of computer simulation techniques for the analysis of complex processes in various fields of scientific research is one of the current direction of modern science. Understanding the studied process in most cases depends on the quality and depth of analysis of both the studied phenomenon and obtained results. Physical tasks, in this case, are a perfect simulator for acquiring the appropriate analytical skills in the fields of complex system analysis using computer simulation techniques. The physical model, in this case, is presented as the combination of input variables and output parameter. The input variables in most cases can be divided into static ones and variables that can be changed during the simulation process. In the easiest case, the model can be presented analytically using appropriate equations. In this case, we can investigate the studied phenomena using visualization techniques changing the input parameter values within the framework of the available range. For the more complex tasks, the studied phenomena can be presented as the combination of differential equations, which determined the appropriate state of the studied system. Creating these equations for various states of the system with a prior defined step c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 367–376, 2022. https://doi.org/10.1007/978-3-030-82014-5_24
368
T. Goncharenko et al.
of the input variable change, we can both simulate and investigate the studied phenomena. Euler-Cromer iteration technique [5] is one of the effective method to implement the herein-before described procedure. Within the framework of our research, we have applied this technique to analyze, simulate, and the obtained results visualization based on the use of physical phenomena. However, we would like to note that this technique is not limited to physical models. The proposed technique can be applied for other phenomenons the state of which can be defined by a system of differential equations. This fact indicates the actuality of the research.
2
Problem Statement
Let the physical model is presented as a combination of input (static and nonstatic) and output parameters (Fig. 1).
Fig. 1. Graphical presentation of the physical models
Analytically, this model can be presented as follows: y = F (x1 , x2 , ..., xn , c1 , c2 , cm )
(1)
where n and m are the number of static and changeable input parameters respectively. Implementation of the simulation procedure assumes the following steps: 1. Analyze the studied phenomena, define the static and variable input parameters and output variables. 2. Determine the state of the investigated task based on the use of appropriate physical equations. 3. Determine both the range and step of the changeable input parameters variation. 4. Perform the simulation process within the range of the input parameters variation. 5. Visualize the received dependencies using appropriate graphical software tools. 6. Analyze the obtained results.
Computer Simulation of Physical Processes Using Euler-Cromer Method
3
369
Literature Review
A lot of works are devoted to solving the problem of complex system analysis using various simulation techniques. So, In [8], the authors applied well-known simulation techniques Fast Fourier Transform, Finite Difference, and EulerCromer methods to analyze an electrostatic plasma that appears as a result of an ablation laser process with an isothermic expansion. The results of the simulation concerning the application of Euler-Cromer method for the synchronization between a circuit with uncertain parameters, where the parameters were obtained from the synchronized computer was presented in [10]. The authors have shown that the obtained results allow using adaptive generalized synchronization for the parameter identification of real systems. In [6], the authors investigated various types of oscillations, the Spring-Mass technique was used to generate the vibrations. The system of linear differential equations of the second degree with constant coefficients, considering the forces applied to the masses as linear-elastic restitution forces with small displacements, was created during the simulation process. Then, this system was discretized and solved using the Euler-Cromer integration method. The paper [15] presents the research results concerning the use of both the finite element method and a semi-implicit EulerCromer time-stepping scheme renders a discrete equation that can be solved by recursion. The authors considered different options to discretize the damping term and its effect on the stability criterion. Questions concerning the application of the simulation techniques to both the analysis and processing of the acoustic emission signal are described in [11–13]. In [2,3] the authors presented the results of the research concerning complex use of both data mining (criterial analysis and clustering) and machine learning (classification and fuzzy logic) techniques for high-dimension gene expression data analysis. In [1,4,7,14,16,18], the authors presented the results of the research focused on the application of current computational methods to investigate various physical processes. In [17], the authors investigated the possibilities of application of the computersimulation techniques to optimize the management of tourists’ enterprises. This paper is the continuation of the herein-before review. Our simulation is devoted to the application of the iteration Euler-Cromer method to the analysis of various nature physical processes during the teaching the physics at both high schools and universities. The aim of the paper is the investigation of the various nature physical processes based on the use of the Euler-Cromer method.
4
The Euler-Cromer Method Description
The Euler-Cromer method is a typical technique for solving systems of differential equations. Let y(x) be a function characterized the appropriate process. Then, the rate of this function change can be defined by the equation: f=
dy dx
(2)
370
T. Goncharenko et al.
At the same time, the acceleration of this function change can be defined by the follows: df (3) g= dx In accordance with the Euler-Cromer method, the Eqs. (2) and (3) can be presented in the following way: yn+1 = yn + fn+1 · dx
(4)
fn+1 = fn + gn · dx
(5)
So, in the general case, the Euler-Cromer method can be implemented as follows: Algorithm 1: Step-wise procedure of the Euler-Cromer method implementation Initialization: set: initial conditions (initial values of y, f and g (y0 , f0 , g0 )); range and step of the x value variation (xmin , xmax , dx); iteration counter t = 1, tmax = count(x); create the empty vectors y and f ; while t ≤ tmax do calculate the f function values ft by the equation (5); calculate the y function values ft by the equation (4); calculate the g function values gt by the equation (3); t = t + 1; end Return the vectors x, y, f and g.
5
The Euler-Cromer Method Application to Solve Physical Tasks
The simulation process concerning the application of the Euler-Cromer method for both solving the physical tasks and the obtained results visualization was performed based on the use of R software [9] using tools of the ggplot2 package. 5.1
Modeling the Motion of a Spherical Body in the Earth’s Gravitational Field
Let’s consider a case when a spherical body is thrown from the surface of the earth at an angle α relative to the horizon at the speed v and then moves in the earth’s gravitational field. Figure 2 illustrates this process. The mathematical model, in this case, can be presented by the following equations: 1 1 dvx = − (6πμr · vx + cSρ · vx2 ) (6) dt m 2
Computer Simulation of Physical Processes Using Euler-Cromer Method
371
Fig. 2. An illustration of the body moving which is thrown at an angle to the horizon
dx = vx dt
(7)
1 dvy 1 = − (g + 6πμr · vy + cSρ · vy2 · sign(vy )) (8) dt m 2 dy = vy (9) dt where: vx and vy are the speed projections to the axis OX and OY respectively; m and r are the body mass and radius; μ, ρ and c are the dynamic viscosity, density of air and shape non-dimension coefficient (for a spherical body μ = N·s 0, 0182 2 , c = 0, 4, ρ = 1.29 kg/m3 ); S = πr2 is the cross-sectional area to m m flow; g = 9, 8 2 is the gravity acceleration. s 1, if vy ≥ 0 sign(vy ) = −1, if vy < 0 Problem statement: for spherical body m = 2 kg and r = 0, 2 m, create the trajectory graphs for angles 15, 30, 45, 60 and 75 of degrees and charts of vx , vy and v versus the time for the angle 60 of degree. The initial speed is 30 m/s. As it can be seen from the Eqs. (6), (7), (8) and (9), the algebraic solution of this task is very problematic. Below, we present the algorithm of this task solving based on the use of the Euler-Cromer method. The results of this algorithm operation are presented in Fig. 3 and Fig. 4 respectively. The analysis of the obtained results allows concluding that all charts reflect adequately the particularities of the move of the spherical body. Moreover, we can investigate the changing of the output parameters in the case of the input parameters variation (for example, shape and geometric parameters of the body, state of the environment, etc.). The obtained model can allow us also to calculate the system output parameters for any moment in time. The hereinbefore facts indicate the
372
T. Goncharenko et al.
reasonableness of the proposed approach use for both the analysis and study of the physical phenomena at schools and universities. Algorithm 2: The use of the Euler-Cromer method for modeling the motion of a spherical body in the Earth’s gravitational field Initialization: m set: the static parameters: m = 2 kg ; r = 0, 2 m; v0 = 30 , s N ·s m 3 ρ = 1.29 kg/m ; μ = 0, 0182 2 , c = 0, 4; g = 9, 8 2 ; range and step of m s the α value variation: (αmin = 15, αmax = 75, dα = 15); range and step of the t value variation (tmin = 0, tmax = 10, dt = 0.01); iteration counter k = 1; create the empty list of the results res; while c1 ≤ count(alpha) do crate empty data frame; set: x0 , y0 , calculate the initial speeds v0x , v0y , v0; while c2 ≤ count(t) do calculate the dvx and dvy values by the equations (6) and (8); calculate the vx and vy : vx = vx + dvx ; vy = vy + dvy ; calculate the dx and dy values by the equations (7) and (9); calculate the x and y: x = x + dx; y = y + dy; calculate the v value: v = vx2 + vy2 ; add the calculated values to the data frame res; if(y ≤ 0) break; end add the data frame to res list end Return the data frame res.
Fig. 3. The trajectory graphs for angles 15, 30, 45, 60 and 75 of degrees
Computer Simulation of Physical Processes Using Euler-Cromer Method
373
Fig. 4. The charts of vx , vy and v versus the time for the angle 60 of degree
5.2
Modeling the Motion of Electric Charges in Electromagnetic Fields
The next task is devoted to the simulation of electrical charge motion in electromagnetic fields for different parameters of the fields. In the general case, if v 0): ∞ t=0
αt = ∞,
∞
αt2 < ∞.
t=0
Methods (12)–(16) are foundational for building other learning methods in decision making systems.
4 4.1
Experiment Calculation of Agent Strategies
In deterministic environments, agent strategies are determined by the maximum value of the Q-function: π(a | s) = arg maxQ(s, a) a
(17)
Under conditions of uncertainty the exploration of state-action space in the decision-making system can performed on the basis of a random distribution in proportion to the values of the functions Q(s, a).
Markovian Learning Methods in Decision-Making Systems
431
One of the options for probabilistic calculation of strategies is to use a “εgreedy” random selection algorithm: ∀a a π(a | s) = ε currentSolution then currentSolution = solution newArg1 = changedArg1 newArg2 = changedArg2 betterSolutionF ound = true end end end if betterSolutionF ound then arg1 = newArg1 arg2 = newArg2 else stepArg1/ = 2 stepArg2/ = 2 noImprove + + end end Return Values of the arg1 and arg2 which provide optimal solution.
444
O. Gorokhovatskyi and O. Peredrii
We have used the principal component analysis according to example [13] in order to visualize all 6-dimensional start candidate points and function values in 2D (Fig. 1). The size of the points corresponds to its SROCC value, the selected start point is highlighted in red color. As one can see there are 4 groups of peaks in the ranges we mentioned above. This plot emphasizes the importance of the proper selection of the start optimization point.
Fig. 1. Map of the initial optimization point selection (TID2013)
The further optimization of pair Fa , Ta in bounds Fa ∈ [4, 8], ΔFa = 1, Ta ∈ [0.3, 0.9], ΔTa = 0.05 with Algorithm 1 allows to achieve the following optimal values: Fa = 4.125, Ta = 0.33125, Fs = 200, Ts = −0.1, Fr = 200, Tr = 0.1 with SROCC = 0.7416. The following optimization of pair Fs , Fr in bounds Fs ∈ [0, 400], ΔFs = 40, Fr ∈ [0, 400], ΔFr = 40 allows to achieve such optimal values: Fa = 4.125, Ta = 0.33125, Fs = 237.5, Ts = −0.1, Fr = 62.5, Tr = 0.1 with SROCC = 0.7489. The optimization of the last pair of thresholds Tr , Ts in ranges Tr ∈ [−0.5, 0.5], ΔTr = 0.05, Ts ∈ [−0.5, 0.5], ΔTs = 0.05 allows to get final result: Fa = 4.125, Ta = 0.33125, Fs = 237.5, Ts = −0.1, Fr = 62.5, Tr = 0 with SROCC = 0.7606. The SROCC values obtained for the first 8 images (which were used for the calculation of factors and thresholds) as well as for all images in TID2013 dataset for all distortions separately are shown in Table 1 (distortion names are according to [11]). As one can see distortions 14–18 are the most problematic. It is also clear that using only 8 first images for the estimation of Fa , Fs , Fr factors and thresholds Ta , Ts , Tr is enough to get the same SROCC value for the entire dataset. So, we obtained SROCC = 0.7526 for the TID2013 dataset after the tuning of the required parameters using 8 first images. Comparing this value to the results of other approaches gathered in [2,3,18] it is clear that this result is neither good not bad but average.
Fine-Tuning of Full Reference Image Quality Assessment
445
Table 1. Comparison of SROCC values for the TID2013 dataset per each distortion No Distortion
First 8 images All images
1
Additive noise
0,9664
0.9481
2
Color noise
0,9261
0.9044
3
Spatially correlated noise
0,9772
0.9413
4
Masked noise
0,8915
0.8266
5
High frequency noise
0,9631
0.9453
6
Impulse noise
0,8628
0.8979
7
Quantization noise
0,8809
0.7757
8
Gaussian blur
0,9492
0.9154
9
Denoising
0,9400
0.9448
10 JPEG compression
0,9482
0.9299
11 JPEG2000 compression
0,8467
0.8873
12 JPEG transmission errors
0,7943
0.7657
13 JPEG2000 transmission errors
0,9129
0.9000
14 Non eccentricity pattern noise
0,5921
0.6859
15 Local block-wise noise
0,4661
0.5342
16 Mean shift
0,2063
0.2556
17 Contrast change
0,7470
0.7395
18 Change of saturation
0,5500
0.6314
19 Multiplicative Gaussian noise
0,9435
0.9028
20 Comfort noise
0,7917
0.8419
21 Lossy compression of noise images
0,9204
0.9170
22 Color quantization with dither
0,9439
0.9057
23 Chromatic aberrations
0,9285
0.8868
24 Sparse sampling and reconstruction 0,8251
0.9063
-
0.7526
All
0.7606
We started the processing of CSIQ dataset [1] with the application of the same factor and threshold values which we found earlier for the TID2013 dataset. This allows to get SROCC = 0.7631 which is not sufficient. The full training from the same start point as for TID2013 is not effective too. So we started the procedure from the determining of the effective start optimization point for this dataset using only 10 first images. We interrupted the process after 1hr, 327 functions were evaluated. The corresponding 2D map after dimensionality reduction is shown in Fig. 2. The point with the maximum SROCC = 0.8274 value is shown in red, its values are: Fa = 2, Ta = 0.2, Fs = 0, Ts = −0.5, Fr = −300, Tr = 0.1.
446
O. Gorokhovatskyi and O. Peredrii
Fig. 2. Map of the initial optimization point selection (CSIQ)
Three following optimization stages were applied: – optimizing of pair Fa , Ta in bounds Fa ∈ [0, 2], ΔFa = 1, Ta ∈ [0, 0.4], ΔTa = 0.05 allows to achieve the following optimal values: Fa = 1, Ta = 0.15, Fs = 0, Ts = −0.5, Fr = −300, Tr = 0.1 with SROCC = 0.8465; – optimizing of pair Fs , Fr in bounds Fs ∈ [−200, 200], ΔFs = 40, Fr ∈ [−500, 500], ΔFr = 40 allows to achieve such optimal values: Fa = 1, Ta = 0.15, Fs = −5, Ts = −0.1, Fr = −15, Tr = 0.1 with SROCC = 0.8505; – optimizing of the last pair of thresholds Tr , Ts in ranges Tr ∈ [−0.5, 0.5], ΔTr = 0.05, Ts ∈ [−0.5, 0.5], ΔTs = 0.05 allows to get final result: Fa = 1, Ta = 0.15, Fs = −5, Ts = −0.025, Fr = −15, Tr = 0 with SROCC = 0.8562. The application of these coefficients (calculated on first 10 images) for the entire CSIQ dataset allowed to get SROCC = 0.8499. The comparison of this value with the other published results shows that this result is worse than a lot of more complex methods but not the worst one. 5.3
Computational Complexity
There is almost no information in papers about the computational complexity of the different approaches, and direct comparison requires all approaches to be implemented in the same environment. It is obvious that more complex (and at the same time better in the accuracy) approaches require more computations. Our approach requires 2 iterations over each pixel in test and reference images, first is about the calculation of average brightness over the image for the estimation of RM S, second one is for all other calculations. Additionally, third iteration with sliding windows is required for Eq. (2). This allows us to calculate quality score for the pair of images from TID2013 dataset (512 × 384 pixels) for 0.07 s. Processing of the pair of images from CSIQ (512 × 512 pixels) requires 0.12 s. Estimation of the quality for custom images 1280 × 960 requires 0.5 s, and the processing of 4032 × 3024 image is finished in 4.5 s.
Fine-Tuning of Full Reference Image Quality Assessment
447
The main computational complexity of the proposed approach relates to the estimation of Fa , Fs , Fr factors and thresholds Ta , Ts , Tr .
6
Conclusions
In this paper, we proposed the image quality measure between test and reference images. The first step of its computation is the known P SN R ratio. It is complemented with the value that estimates local block distortions ratio A. It is calculated with the comparison of patches in test and reference images searching for such of them which have degenerated variety of colors. After that, the saturation index is calculated following by RM S contrast value. Both saturation and contrast values depend on the difference between images. As a result, a common quality measure is a combination of these values. There are 6 factors and threshold for each part of the measure to be set up:Fa , Fs , Fr , Ta , Ts , and Tr . The fine-tuning of these values allows adopting the measure to the specific of the particular dataset. The partial numerical optimization algorithm has been proposed to find proper values of each of these parameters for both TID2013 and CSIQ datasets. The investigation of the extremum map confirmed the importance of start optimization point selection. Testing of the quality of the proposed measure and the comparison to existing methods showed that the accuracy is average in terms of the correlation to mean opinion score provided by humans. The computational complexity of the measure allows to apply it in less than a second for the images of medium size. The application of the measure is possible in the tasks when the accuracy is not so important but the speed of image processing matters. The possible improvements of the proposed measure should relate to the accuracy firstly to be more competitive to other approaches.
References 1. CSIQ image quality database. http://vision.eng.shizuoka.ac.jp/mod/page/view. php?id=23 2. Tampere image database 2013 TID2013, version 1.0. http://www.ponomarenko. info/tid2013.htm 3. Ding, K., Ma, K., Wang, S., Simoncelli, E.P.: Comparison of full-reference image quality models for optimization of image processing systems. Int. J. Comput. Vis. 129, 1–24 (2021) 4. Kamble, V., Bhurchandi, K.: No-reference image quality assessment algorithms: a survey. Optik 126(11), 1090–1097 (2015). https://doi.org/10.1016/j.ijleo.2015.02. 093. https://www.sciencedirect.com/science/article/pii/S003040261500145X 5. Larson, E.C., Chandler, D.M.: Most apparent distortion: full-reference image quality assessment and the role of strategy. J. Electron. Imaging 19(1), 1–21 (2010). https://doi.org/10.1117/1.3267105 6. Liu, T., Liu, K., Lin, J.Y., Lin, W., Kuo, C.J.: A paraboost method to image quality assessment. IEEE Trans. Neural Netw. Learn. Syst. 28(1), 107–121 (2017). https://doi.org/10.1109/TNNLS.2015.2500268
448
O. Gorokhovatskyi and O. Peredrii
7. Liu, X., Van De Weijer, J., Bagdanov, A.D.: RankIQA: learning from rankings for no-reference image quality assessment. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1040–1049 (2017). https://doi.org/10.1109/ICCV. 2017.118 8. Madeed, N.A., Awan, Z., Madeed, S.A.: Image quality assessment - a survey of recent approaches. In: Computer Science and Information Technology (2018) 9. Pedersen, M., Hardeberg, J.Y.: Full-reference image quality metrics: classification R Comput. Graph. Vis. 7(1), 1–80 (2012). https:// and evaluation. Found. Trends doi.org/10.1561/0600000037 10. Peng, P., Li, Z.N.: General-purpose image quality assessment based on distortionaware decision fusion. Neurocomputing 134, 117–121 (2014). https://doi.org/10. 1016/j.neucom.2013.08.046 11. Ponomarenko, N., et al.: Image database TID2013: peculiarities, results and perspectives. Signal Process.: Image Commun. 30, 57–77 (2015). https://doi. org/10.1016/j.image.2014.10.009. http://www.sciencedirect.com/science/article/ pii/S0923596514001490 12. Saha, A., Wu, Q.J.: Full-reference image quality assessment by combining global and local distortion measures. Signal Process. 128, 186–197 (2016). https://doi. org/10.1016/j.sigpro.2016.03.026 13. Sharma, A.: Principal component analysis (PCA) in Python (2020). https://www. datacamp.com/community/tutorials/principal-component-analysis-in-python 14. Sheikh, H.R., Bovik, A.C.: Image information and visual quality. IEEE Trans. Image Process. 15(2), 430–444 (2006). https://doi.org/10.1109/TIP.2005.859378 15. Wang, Z., Bovik, A.: Modern image quality assessment. In: Modern Image Quality Assessment (2006) 16. Wang, Z., Bovik, A.C., Lu, L.: Why is image quality assessment so difficult? In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. IV-3313–IV-3316 (2002). https://doi.org/10.1109/ICASSP.2002.5745362 17. Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: The Thrity-Seventh Asilomar Conference on Signals, Systems Computers, vol. 2, pp. 1398–1402 (2003). https://doi.org/10.1109/ACSSC. 2003.1292216 18. Zhai, G., Min, X.: Perceptual image quality assessment: a survey. Sci. China Inf. Sci. 63, 1–52 (2020) 19. Zhan, Y., Zhang, R., Wu, Q.: A structural variation classification model for image quality assessment. IEEE Trans. Multimedia 19(8), 1837–1847 (2017). https://doi. org/10.1109/TMM.2017.2689923 20. Zhang, L., Zhang, L., Mou, X., Zhang, D.: FSIM: a feature similarity index for image quality assessment. IEEE Trans. Image Process. 20(8), 2378–2386 (2011). https://doi.org/10.1109/TIP.2011.2109730 21. Zhang, M., Mou, X., Zhang, L.: Non-shift edge based ratio (NSER): an image quality assessment metric based on early vision features. IEEE Signal Process. Lett. 18(5), 315–318 (2011). https://doi.org/10.1109/LSP.2011.2127473 22. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004). https://doi.org/10.1109/TIP.2003.819861
A Model for Assessing the Rating of Higher Education School Academic Staff Members Based on the Fuzzy Inference System Sergii Babichev1,2(B) , Aleksander Spivakovsky2 , Serhii Omelchuk2 , and Vitaliy Kobets2 1
´ ı nad Labem, Ust´ ´ ı nad Labem, Jan Evangelista Purkynˇe University in Ust´ Czech Republic [email protected] 2 Kherson State University, Kherson, Ukraine {sbabichev,omegas}@ksu.ks.ua, [email protected], [email protected] Abstract. In this paper, we present the model of assessment of higher education school academic staff members rating based on the complex use of Harrington’s desirability function and a fuzzy inference system. Four main directions of activity were evaluated within the framework of the proposed model: pedagogical activity; science and research; organization activity and other activity types. Each of the proposed types of activities was assessed separately using the Harrington desirability function based on prior determined scores, where the general activity index of the appropriate score was calculated as the weighted average of the private desirability values. The Mamdani inference algorithm with triangular and trapezoidal membership functions was used within the framework of the fuzzy inference process implementation to evaluate the general rating score of the appropriate academic staff member. The base of rules was proposed to form the output for general rating scores calculated based on the individual rating scores. The output of the system was presented for each of the appropriate activity types separately and based on the assessment of the general rating score that contained the separate rating scores as the components. To our best mind, the proposed model can help us to optimize the functional possibilities of higher school system operation. Keywords: Fuzzy inference · Higher education school · Academic staff members’ rating score · Mamdani inference algorithm · Decision making · Membership functions · Harrington desirability function
1
Introduction
The current state of high education schools in the context of reforms and limited funding necessitates the development of automated and information systems to c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 449–463, 2022. https://doi.org/10.1007/978-3-030-82014-5_30
450
S. Babichev et al.
support decision-making, allowing to optimize the functioning of all structural elements and, as a result, the system as a whole. Increasing effectiveness of higher school education institutions can be achieved by the informatization of the management process, which, as one of the elements, includes the development of a decision-making system that reflects all directions of the institution’s operation. Obviously, this system should be based on modern computational and information technologies which should be able to analyze and process the current information in real-time with needed accuracy. The system of staff members, departments, and faculties activity of the University rating assessment is an integral part of both the information decisionmaking system and the internal system of quality assurance of the University’s higher education. Implementation of this system stimulates staff members’ professional development, the productivity of scientific and educational activity, development of their creativity and initiative. It is obvious that in the general case this system should be multi-criterial and cover various aspects of academic staff members’ activity. Techniques to evaluate their rating score based on the use of both the average or sum of the used criteria values are not effective since these methods do not allow considering the weights of appropriate scores. More objective is the techniques based on the division of the evaluated directions into groups with the following evaluation of the appropriate type of the activity. At final step, it is necessary to assess the general score based on the previously evaluated partial scores. Implementation of this procedure request development of the base of rules according to which form both the partial and general score. Moreover, it is necessary to form the partial scores to evaluate the staff members’ activity within the framework of appropriate direction. In this research, we present the solution of this problem based on the complex use of Harrington desirability function [6] and a fuzzy inference system, the basic concepts of which was proposed by Zadeh [18,19]. To our best mind, the proposed technique can improve the objectivity for assessment the rating score of higher education staff members’ activity due to parallelizing the processing information procedure related to various types of the staff members’ activities on the one hand, and the use of modern methods of computer data processing on the other hand. This fact indicates the actuality of the research in this subject area.
2
Problem Statement
Figure 1 presents the step-by-step procedure of higher academic school members’ rating assessment implemented within the framework of our research. As it can be seen from Fig. 1, the implementation of this procedure assumes solving the following tasks: – define the number of assessment direction of the higher school staff members’ activity, which should be evaluated within the framework of this procedure implementation;
A Model for Assessing the Rating of Academic Staff Members
451
Fig. 1. A step-by-step procedure of higher academic school members’ rating assessment
– determination of both the rating score indexes and ranges of their variation for each of the activity directions; – application of the Harrington desirability function to each of the evaluated rating scores in order to transform the score indexes into the equal ranges; – calculation of the various evaluated parameters weighted averages for each of the activity directions. – create and setup the fuzzy inference system for assessing the general rating score. Implementation of the fuzzy inference system in order to evaluate the general rating score value. – analysis of the obtained results. These tasks are solved within the framework of our research.
3
Literature Review
The development of the control and decision-making systems based on both the data mining and machine learning techniques is currently one of the topical areas of scientific research [7,11–16]. The development of fuzzy inference systems, in this context, is one of the modern directions of data science. A lot of scientific works are devoted to this direction. So, in [1], the authors present the research results regarding the application of fuzzy inference system (ANFIS) to control
452
S. Babichev et al.
the heat transfer augmentation of the nanofluids. In this paper, the authors have demonstrated the performance of the artificial intelligence algorithm as an auxiliary method for cooperation with the computational field dynamic. In this research, the turbulent flow of Cu/water nanofluid warming up in a pipe is considered as a sample of a physical phenomenon. The analysis of the simulation procedure indicated that the optimal results are met by employing gauss2mf in the model as the membership function and x, y, and z coordinates, the nanoparticle volume fraction, and the temperature as the inputs. Moreover, the developed fuzzy inference model allows us to find the relation of the nanofluid pressure to the nanoparticle fraction and the temperature. In [2,3], the authors have applied the fuzzy inference procedure in the decision-making system to choose the most informative gene expression profiles which allow distinguishing the investigated samples with higher resolution. The results of the various binary classifiers operation were used as the input data. The triangular and trapezoidal membership functions were applied within the framework of the proposed model. The Mamdani inference algorithm was used by the authors for inference procedure implementation. The model of the weighted fuzzy neural network to solve the mixed-process signal classification problem is proposed in [17]. The Takagi-Sugeno fuzzy classifier was applied to implement the fuzzy inference procedure in the proposed model. It has allowed integrating the learning properties and classification mechanisms of process neural networks for time-vary signals with the logical inference abilities of fuzzy systems. The proposed model combines a fuzzy decision theory with the mixed-process signal processing theory. It is able from a set of training examples automatically deduce both the membership function and the fuzzy predicate logic rules and quickly build a fuzzy expert system prototype. In [4,5,10], the authors presented the results of the research regarding the development of a novel self-organizing fuzzy inference ensemble framework. The self-organizing fuzzy inference system is capable of self-learning due to a highly transparent predictive procedure from streaming data on a chunk-by-chunk basis through a human-interpretable process. It is very important that the base learner can continuously self-adjust its decision-making boundaries based on the interclass and intraclass distances between the prototypes identified from successive data chunks for higher classification precision. The authors have shown also that due to its parallel distributed computing architecture, the proposed ensemble framework can achieve high classification precision while maintaining high computational efficiency on large-scale problems. The questions concerning the development of the system for assessing academic performance are considered in [9]. The authors proposed to assess the academic performance based on the use of the decision-making support model applying the fuzzy inference system. The Sugeno inference algorithm was used in the proposed model. The authors have shown the advantages of the proposed technique in comparison with other methods applied in this subject area. Increasing in the rating of staff will increase rating of disciplines. It means that
A Model for Assessing the Rating of Academic Staff Members
453
rating of staff is important to increase quality of corresponding disciplines for students [8]. However, we would like to note that despite the achievements in this subject area and the widespread application of the fuzzy inference system in various areas of human activity, the problem of increasing the efficiency of higher school education functioning through using modern computer and information technologies remains relevant nowadays. The goal of this research is the development of the technique to assess the higher school academic members’ rating score based on the complex use of the Harrington desirability function and fuzzy inference system considering various directions of the academic staff members’ activity.
4
Materials and Methods
As it can be seen from Fig. 1, the first stage of the proposed technique creation is the definition of a set of quantitative score indexes that characterize the appropriate direction of staff members’ activity. Within the framework of our research, we divide all directions into four groups considering the activity type: science and research; pedagogical activity; organization activity, and another activity types (reviews, editors, co-editors, conferences program committee members, etc.). The ranges of appropriate private score indexes value variation were transformed into equal intervals from 0 to 1 by application of the Harrington desirability function [6], the equation and chart of which are presented below: d = exp(−exp(−Y )) where Y is the non-dimensional parameter, the value of which are changed within the range from −2 to 5 and d is the private desirability value (Fig. 2).
Fig. 2. Harrington desirability function
The reason to use the Harrington desirability function in context of our research is the following. Firstly, this is one of the effective methods of reducing
454
S. Babichev et al.
a multi-criterial tasks to a one-criterion ones. In our case, we should calculate the partial general rating score indexes based on the various individual score values within the framework of appropriate activity direction. Secondly, standard marks on the desirability scale allow us to objectively form linguistic terms considering the level of the academic staff members’ activity in appropriate activity direction. So, the desirability index from 0 to 0.2 correspond to very low rating score (VL), from 0.2 to 0.37 – low value of rating score (L), from 0.37 to 0.63 – medium rating score (Md), from 0.63 to 0.8 and from 0.8 to 1 - high (H) and very high (VH) values of rating score respectively. Evaluation of the general Harrington desirability indexes HDI for each of the evaluated activity direction based on the partial scores P Sc was performed in accordance with the following step-wise procedure: – Transformation of scales of the partial indexes scores into scale of nondimensional parameter Y by the following linear equation: Y = a + b · P Sc
(1)
where a and b are the coefficients which are determined empirically considering the boundary values of the appropriate partial score indexes: Ymin = a + b · P Scmin (2) Ymax = a + b · P Scmax – Calculation of the Yi non-dimensional parameter for each of the partial scores: Yi = a + b · P Sci
(3)
– Calculation of the private desirability values for each of the partial scores: di = exp(−exp(−Yi ))
(4)
– Calculation of the general Harrington desirability index for evaluated activity direction as a weighted average of the obtained private desirability values: n wi di HDI = i=1 (5) n i=1 wi where n is the number of parameters evaluated within the framework of the appropriate activity direction; wi is the weighted coefficient indicated the level of the i-th partial score importance, the value of which is determined by the experts in this subject area. The following individual evaluation criteria (partial scores) for each of the activity directions were used within the framework of our research: 1. Science and Research (SR): – SR1 – scientific publications in periodicals (with ISSN) that are indexed in the numeric databases Scopus or/and Web of Science Core Collection;
A Model for Assessing the Rating of Academic Staff Members
455
– SR2 – proceedings of the international conferences that are indexed in the numeric databases Scopus or/and Web of Science Core Collection; – SR3 – total h-index in WoS, Scopus, and Google Scholar which calculated as follows: h = 100 · (hsc + hwos ) + 10 · hGSc – SR4 – citation of the scientist in WoS, Scopus, and Google Scholar which calculated through citations number without self-citations in the following way: ct = 2.5 · (Nsc + Nwos ) + 0.5 · NGSc – SR5 – Scientific publications which included in the list of scientific professional publications of Ukraine of category “B”; – SR6 – monographs or parts of the collective monographs; – SR7 – defense of the dissertation for obtaining a scientific degree, awarding a scientific title. 2. Pedagogical Activity (PA): – P A1 – textbooks and/or teaching and methodical textbooks with ISBN; – P A2 – scientific guidance (advising) of the aspirant who has received the document regarding awarding the scientific degree; – P A3 – management of students who have received a prize at both the II stage of the Ukrainian Student Olympiad or/and -Ukrainian competition of student research works; – P A4 – pedagogical activity in foreign institutions of higher education (mobility programs, contracts, grants, double degree programs); – P A5 – teaching the special disciplines in a foreign language for fullcomplete groups for the academic year (except for specialists who teach specialized philological disciplines in a foreign language); – P A6 – distance courses with electronic educational resources prepared on the university website (submit using moodle software); – P A7 – accredited educational programs according to the criteria of foreign accreditation agencies and the National Agency for Higher Education Quality Assurance. 3. Organization Activity (OA): – OA1 – performing the functions of editor-in-chief/vice editor-inchief/executive secretary of a scientific publication/member of the editorial board included in the list of scientific professional publications of Ukraine or a foreign peer-reviewed scientific publication; – OA2 – membership in specialized scientific councils; – OA3 – organization and holding of international conferences, workshops, and other competitions; – OA4 – rector, vice-rector, dean, head of the department, or member of the scientific council of the university/faculty; – OA5 – coordinator or researcher of an international/national scientific project; – OA6 – submitted applications for international projects/grants. 4. Another Activities (AA):
456
S. Babichev et al.
– AA1 – staff member participation in academic mobility programs (research, research internship, advanced training); – AA2 – review of scientific articles submitted to periodicals of category “A”; – AA3 – international students studying in the relevant educational program (for guarantors of educational programs); – AA4 – employed graduates in accordance with the obtained degree of higher education (for guarantors of educational programs). Algorithm 1 presents step-by-step procedure of the herein-before proposed technique of academic staff members rating score assessment taking into account the position held. Algorithm 1: A step-by-step procedure of the academic staff members rating score assessment Initialization: set: list of the academic staff members’ positions held to be assessment (ph list); formation: dataframe of weight coefficients for both each of the assessed parameters and each of the activity directions considering the position held (weight coef f ); setup: the membership functions and fuzzy rules considering the position held; loading: the database of individual score values evaluated for the academic staff members; iteration counter c = 1; while c ≤ count(ph list) do choose the weight coefficient vector from weight coef f dataframe for current staff member position held; allocate the data subset from the general database taking into account the current position held; calculate the private evaluations and general Harrington desirability values for each of the allocated staff members for each of the activity directions by the formulas (1)–(5); fuzzy inference procedure implementation, formation of the general rating score values for the allocated staff members; c = c + 1; end Return the vector of general rating score values. The fuzzy inference system based on the Mamdani inference algorithm was used to calculate the academic staff members’ general rating scores. The trapezoidal and triangular membership functions were used for boundary and intermediate linguistic terms of the input parameters respectively. The triangular membership functions for all linguistic terms were used for output parameter (general rating score). The terms Very Low (VL – 0 < x ≤ 0.37), Satisfactory
A Model for Assessing the Rating of Academic Staff Members
457
(SF – 0.37 < x ≤ 0.63), High (H – 0.63 < x ≤ 0.8), and Very High (VH – 0.8 < x ≤ 1) were used for activity directions SR, PA, and OA. For Another activities (AA) we used the terms Low (L – 0 < x ≤ 0.37), Satisfactory (SF – 0.37 < x ≤ 0.63), and High (H – 0.63 < x ≤ 1). The values of the output parameter - general rating score (GRS) was varied within the range from 0 to 100 and this range was divided into the following sub-ranges: Very Low (VL – 0 < x ≤ 20), Low (L – 20 < x ≤ 40), Satisfactory (SF – 40 < x ≤ 60), High (H – 60 < x ≤ 80), and Very High (VH – 80 < x ≤ 100). The triangular membership functions were used for each of the terms in this case. As it can be seen, implementation of the Algorithm 1 assumes a prior formation of the weight coefficients for each of the assessed parameters considering the position held. Moreover, it is necessary to create the rules base to implement the fuzzy inference procedure considering the position held. These rules can be differed for various position held. These doing can be carried out by the experts in this subject area taking into account the specificity of appropriate academic higher school.
5
Experiment, Results and Discussion
As an example, let’s consider the professor’s position. Table 1 presents the individual parameters rating scores of ten professors’ position for various activity directions. Figure 3 shows the charts of the sum of rating score values calculated for various activity directions separately. Figure 4 shows the same charts in the case of the use of weighted Harrington desirability general index calculated using the formulas (1)–(5). As it can be seen from Figs. 3 and 4, the sum of rating score absolute values and weighted Harrington desirability index for each of the activity directions are changed according to each other. However, to our mind introduction of the weight coefficient allow us to some increase the sensitivity of the model to the input parameters. Moreover, the proposed technique allows us also to transform the input parameters values into the equal ranges taking into account the importance of appropriate parameter. Figure 5 shows the membership functions for both the input and output variables that were used for the fuzzy inference procedure implementation. The results of the simulation regarding the evaluation of both the rating score for each of the activity types and the general rating score evaluated using the fuzzy inference system are presented in Fig. 6. An analysis of the obtained results allows concluding that the applied fuzzy inference system has divided the academic persons into groups considering the level of their activity adequately. The person P4 has a very high activity level initially. The persons P2 and P3 in total have high activity levels with some differences between each other. The fuzzy inference system has recognized them as persons with high academic activity. Person P3 has a low academic activity level. The fuzzy inference system confirms this fact. Other persons have satisfactory activity in total with some differences between each other. The fuzzy model has identified them as persons
458
S. Babichev et al.
Table 1. The professor position individual parameters rating scores for various activity directions Individual rating parameters
Staff members (professor) 5
6
SR1
1 478
2 145
3 36
4 510
145
250
7 365
8 124
9 87
10 348
Weight 1
SR2
215
100
10
350
54
80
120
60
50
250
0.8
SR3
1800
650
320
1650
950
500
1200
600
1600
1200
1
SR4
900
450
120
1450
487
600
950
450
850
950
1
SR5
125
150
10
130
30
10
90
20
60
10
0.5
SR6
50
20
0
70
20
30
40
10
20
20
0.8
SR7
50
25
0
70
0
0
0
0
0
50
0.9
PA1
55
48
20
80
70
65
25
90
42
20
1
PA2
50
100
0
25
25
50
0
0
50
25
1
PA3
40
30
20
60
20
40
60
30
20
40
0.6
PA4
80
40
0
60
30
40
30
0
10
20
0.5
PA5
20
40
20
30
0
0
10
30
20
30
0.5
PA6
30
20
30
40
40
40
0
10
40
30
0.6
PA7
50
60
20
60
20
40
10
40
30
50
0.8
OA1
50
30
40
40
10
20
40
50
30
10
0.8
OA2
20
30
10
30
40
30
20
10
50
20
0.5
OA3
50
30
20
60
50
10
30
20
40
10
0.6
OA4
80
30
10
60
20
40
10
60
50
30
1
OA5
40
100
20
50
30
20
40
60
80
10
0.9
OA6
20
40
10
30
50
10
20
30
50
20
0.5
AA1
40
20
10
40
30
20
10
30
10
10
0.8
AA2
10
40
20
50
20
30
60
50
30
50
1
AA3
15
12
10
30
10
15
10
10
0
0
0.5
AA4
0
15
0
10
5
20
0
15
10
20
0.5
with satisfactory activity levels. Of course, it is possible to improve the fuzzy inference model by improving the base of fuzzy rules, more careful selection, and setting the membership functions considering the requests of appropriate higher education schools. However, to our mind, the main goal of this procedure implementation is the grouping of the academic staff members into groups taking into account the level of their activities in various directions considering the weight of the appropriate activity. We think that we have solved this task in accordance with the problem statement.
A Model for Assessing the Rating of Academic Staff Members
459
Fig. 3. The charts of the sum of rating score values calculated for various activity directions
Fig. 4. The charts of the of weighted Harrington desirability general index calculated for various activity directions
460
S. Babichev et al.
Fig. 5. The membership functions for both the input and output variables that were used for the fuzzy inference procedure implementation
Fig. 6. The results of the simulation regarding the evaluation of both the rating score for each of the activity types and the general rating score evaluated using the fuzzy inference system
A Model for Assessing the Rating of Academic Staff Members
6
461
Conclusions
In this manuscript, we have presented the model of higher school academic staff members rating score evaluation based on the complex use of Harrington’s desirability function and fuzzy inference system. A general step-by-step procedure of the statement problem solving is presented as a structural block-chart contained all necessity stage of this problem-solving. Four types of activity directions have been considered within the framework of our research: science and research; pedagogical activity; organization activity, and other activity types (reviews, editors, co-editors, conferences program committee members, etc.). Each of the assessment activities included various evaluations of the academic staff members’ activities within the appropriate direction. In beginning, the ranges of appropriate private score indexes value variation were transformed into equal intervals from 0 to 1 by application of the Harrington desirability function. Evaluation of the general Harrington desirability indexes HDI for each of the evaluated activity directions has been performed based on the use of a weighted average of the obtained partial scores PSc. The fuzzy inference system based on the Mamdani inference algorithm was used to calculate the academic staff members’ general rating scores. The trapezoidal and triangular membership functions were used for boundary and intermediate linguistic terms of the input parameters respectively. The triangular membership functions for all linguistic terms were used for output parameter (general rating score). The individual parameters rating scores of ten professors’ position for various activity directions has been evaluated during the simulation procedure. The analysis of the simulation results has shown high effectiveness of the proposed hybrid model. The fuzzy inference system has divided the academic persons into groups considering the level of their activity adequately. Improving this model can be achieved by both more careful formation the base of fuzzy rules and setting the membership functions considering the requests of appropriate higher education schools. However, to our mind, the main goal of this procedure implementation is the grouping of the academic staff members into groups taking into account the level of their activities in various directions considering the weight of the appropriate activity. We hope that we have solved this task in accordance with the problem statement.
References 1. Babanezhad, M., Behroyan, I., Nakhjiri, A., Marjani, A., Shirazian, S.: Performance and application analysis of ANFIS artificial intelligence for pressure prediction of nanofluid convective flow in a heated pipe. Sci. Rep. 11(1), Article no. 902 (2021). https://doi.org/10.1038/s41598-020-79628-w 2. Babichev, S., Durnyak, B., Zhydetskyy, V., Pikh, I., Senkivskyy, V.: Application of optics density-based clustering algorithm using inductive methods of complex system analysis. In: IEEE 2019 14th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2019 - Proceedings, pp. 169–172 (2019). https://doi.org/10.1109/STC-CSIT.2019.8929869
462
S. Babichev et al.
ˇ 3. Babichev, S., Skvor, J.: Technique of gene expression profiles extraction based on the complex use of clustering and classification methods. Diagnostics 10(8), Article no. 584 (2020). https://doi.org/10.3390/diagnostics10080584 4. Chyrun, L., Kravets, P., Garasym, O., Gozhyj, A., Kalinina, I.: Cryptographic information protection algorithm selection optimization for electronic governance IT project management by the analytic hierarchy process based on nonlinear conclusion criteria. In: CEUR Workshop Proceedings, vol. 2565, pp. 205–220 (2020) 5. Gu, X., Angelov, P., Zhao, Z.: Self-organizing fuzzy inference ensemble system for big streaming data classification. Knowl.-Based Syst. 218, Article no. 106870 (2021). https://doi.org/10.1016/j.knosys.2021.106870 6. Harrington, J.: The desirability function. Ind. Qual. Control 21(10), 494–498 (1965) 7. Izonin, I., Tkachenko, R., Verhun, V., et al.: An approach towards missing data management using improved GRNN-SGTM ensemble method. Int. J. Eng. Sci. Technol. (2020, in Press). https://doi.org/10.1016/j.jestch.2020.10.005 8. Kobets, V., Liubchenko, V., Popovych, I., Koval, S.: Institutional aspects of integrated quality assurance of study programs at HEI using ICT. In: CEUR Workshop Proceedings, vol. 2833, pp. 83–92 (2020) 9. Kurniawan, D., Utama, D.: Decision support model using FIM Sugeno for assessing the academic performance. Adv. Sci. Technol. Eng. Syst. 6(1), 605–611 (2021). https://doi.org/10.25046/aj060165 10. Lytvyn, V., Gozhyj, A., Kalinina, I., et al.: An intelligent system of the content relevance at the example of films according to user needs. In: CEUR Workshop Proceedings, vol. 2516, pp. 1–23 (2019) 11. Lytvyn, V., Salo, T., Vysotska, V., et al.: Identifying textual content based on thematic analysis of similar texts in big data. In: IEEE 2019 14th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2019 - Proceedings, vol. 2, pp. 84–91 (2019). https://doi.org/10. 1109/STC-CSIT.2019.8929808 12. Marasanov, V., Sharko, A., Sharko, A., Stepanchikov, D.: Modeling of energy spectrum of acoustic-emission signals in dynamic deformation processes of medium with microstructure. In: 2019 IEEE 39th International Conference on Electronics and Nanotechnology, ELNANO 2019 - Proceedings, pp. 718–723 (2019). https://doi. org/10.1109/ELNANO.2019.8783809 13. Marasanov, V., Stepanchikov, D., Sharko, A., Sharko, A.: Technique of system operator determination based on acoustic emission method. Adv. Intell. Syst. Comput. 1246, 3–22 (2021). https://doi.org/10.1007/978-3-030-54215-3 1 14. Marasanov, V., Sharko, A., Sharko, A.: Energy spectrum of acoustic emission signals in coupled continuous media. J. Nano Electron. Phys. 11(3), Article no. 03027 (2019). https://doi.org/10.21272/jnep.11(3).03028 15. Rzheuskyi, A., Kutyuk, O., Vysotska, V., et al.: The architecture of distant competencies analyzing system for it recruitment. In: IEEE 2019 14th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2019 - Proceedings, vol. 3, pp. 254–261 (2019). https://doi.org/10. 1109/STC-CSIT.2019.8929762 16. Tkachenko, R., Izonin, I., Kryvinska, N., et al.: An approach towards increasing prediction accuracy for the recovery of missing IoT data based on the GRNNSGTM ensemble. Sensors (Switzerland) 20(9), Article no. 2625 (2020). https:// doi.org/10.3390/s20092625
A Model for Assessing the Rating of Academic Staff Members
463
17. Xu, S., Feng, N., Liu, K., Liang, Y., Liu, X.: A weighted fuzzy process neural network model and its application in mixed-process signal classification. Expert Syst. Appl. 172, Article no. 114642 (2021). https://doi.org/10.1016/j.eswa.2021. 114642 18. Zadeh, L.A., Abbasov, A.M., Shahbazova, S.N.: Fuzzy-based techniques in humanlike processing of social network data. Int. J. Uncertainty Fuzziness Knowl.-Based Syst. 23, 1–14 (2015) 19. Zadeh, L.: Possibility theory and soft data analysis. In: Mathematical Frontiers of the Social and Policy Sciences, pp. 69–129 (2019)
Early Revealing of Professional Burnout Predictors in Emergency Care Workers Igor Zavgorodnii1 , Olha Lalymenko1 , Iryna Perova2(B) , Polina Zhernova2 , Anastasiia Kiriak2 , and Oleksandr Novytskyy2 1
2
Kharkiv National Medical University, Kharkiv, Ukraine Kharkiv National University of Radio Electronics, Kharkiv, Ukraine {anastasiia.kiriak,oleksandr.novytskyi}@nure.ua
Abstract. The article discusses the issues of determining the group of prepathology for early diagnosis of professional burnout in emergency medical workers who are subject to an extreme influence of psychological factors of the working environment and have a high intensity of the work process, using the questionnaire “Work-related behavior (Arbeitsbezogenes Verhaltens und Erlebensmuster - AVEM)”. This approach allows us to identify a group of people who are prone to the development of professional burnout, which in the future will allow us to improve the calculations of the AVEM questionnaire and establish professionally forming criteria for professional selection when hiring, taking into account the individual-typological characteristics of the individual and his behavior and experiences associated with professional activities. Keywords: Professional burnout · Work-related behavior · Professional claim · Subjective significance of professional activity
1
Introduction
In the last decade, the problem of chronic fatigue (occupational stress, burnout) of workers in conditions of high labor intensity and tension, in particular, emergency care workers has become especially relevant for the world medical science [19]. The growth of nervous and emotional stress during work, constant communication with a large number of people leads gradually to emotional exhaustion, forms a state of stress, and regular excessive stress can lead to overstrain of the body’s functional state, which will result in both physical and mental health problems, including the formation of signs of burnout [9]. Despite the sufficient number of published works on the study of the structure, nature, frequency and features of burnout in different professional groups of specialists, there is a lack of common understanding of the early criteria of this phenomenon, theoretically defined and methodologically substantiated model of its occurrence and development of professional deformities. In addition, the study of early diagnosis of burnout is extremely limited, taking into account the contribution of psychological determinants of personality to its development c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 464–478, 2022. https://doi.org/10.1007/978-3-030-82014-5_31
Early Revealing of Professional Burnout Predictors
465
based on the study of behaviors and experiences related to work, in particular in emergency care workers.
2
Problem Statement
The study is devoted to the development of a prognostic model for detecting prepathological manifestations of occupational burnout in health care workers (emergency care workers), taking into account the contribution of certain workrelated individual-typological determinants of behavior and experiences to its formation.
3
Literature Review
The concept of occupational health integrates complex human relationships with the professional environment and is a measure of the coherence of social needs of society and human capabilities in conditions of professional activity. It should be noted that only if the professional characteristics of the specialist meet the professional requirements, proper physical, mental health and social well-being, it is possible to ensure high work efficiency, optimal social and industrial adaptation and reduction of costs for medical care/treatment of the working contingent [10, 11]. In the last decade, the problem of chronic fatigue of workers in conditions of high labor intensity has become especially relevant for the world medical science. The growth of neuro-emotional loads in the process of work forms a state of stress, and often - overexertion of the body’s functional state, which can be regarded as the formation of occupational stress and chronic fatigue syndrome [3]. In the modern world, in conditions of rapid growth of economic and technological progress, there is an increase in the level of professional competition, and, at the same time, the problems of psychological formation and development of personal characteristics. Personality undergoes an occupational transformation due to the presence of a number of destructive innovations in the structure of its activities. Workers whose professional activities are directly related to people are particularly prone to such deformation. An occupational stress affects negatively the workers’ capacity for work, reducing their productivity and impairing interpersonal interactions [6]. Occupational stress is a strained state of the employee, which occurs under the influence of emotionally negative and extreme factors associated with professional activities. The development of stress in the workplace is highlighted as an important scientific problem in connection with its impact on capacity for work, productivity and quality of work and health of employees. Literature reviews on the problem show that a number of work stressors, such as role uncertainty, conflict, lack of control, work overload, etc., are usually closely related to mental stress, psychosomatic disorders, as well as adverse behavioral consequences [7,16].
466
I. Zavgorodnii et al.
Occupational stress is defined as a multidimensional phenomenon that is expressed in physiological and psychological reactions to a difficult work situation, as well as a specific psychophysiological reaction associated with the performance of professional duties [12]. The research focuses on the role of the work environment as a source of stress, regardless of individual characteristics and life circumstances. Other works on the study of occupational stress have focused on the interaction between the individual and the work environment in the works by S. Carrere et al. [31]. The most famous model of occupational stress is the so-called Michigan model (and its variants), created at the Institute for Social Research of Michigan University (USA) by scientists D. Katz, R.S. Kahn (Fig. 1) [17,18]. This model of stress considers social environment, nature of human personality in terms of perception of the factors of this environment and its reactions as factors of industrial stress, as well as possible consequences of this condition on the health of the individual.
Fig. 1. Occupational stress model (according to D. Katz and R.S. Kahn)
Another known model of occupational stress has been proposed by J.E.Me. Grath [21]. According to this model, the source of stress, as in the model of the social environment, is an objective (real) situation, which is perceived as a subjectively reflected one through the mechanism of its evaluation. This assessment gives place to the phase of deciding on the manifestation of a specific reaction, which is mainly conscious behavior, rather than a combination of behavioral, physiological and psychological reactions, as in the model of the social environment [26]. Although these two models are consistent with each other, they differ in the type of final results.
Early Revealing of Professional Burnout Predictors
467
A generalized model of occupational stress is presented in the works by M. J. Smith and described in more detail earlier by M.J. Smith and P. Carayon (Fig. 2). In general, factors of the labor system can cause direct stress reactions, which are regulated by personal and cognitive characteristics. If these short-term stress reactions become chronic, they can lead to significant negative consequences for health and work.
Fig. 2. General model of occupational stress according to M.J. Smith (1997)
According to a generalized model of occupational stress by Smith M.J. (1997), the main components of occupational stress development are the system of labor organization, short-term and long-term reactions to stress, individual and personal characteristics [5].
468
I. Zavgorodnii et al.
Organizational factors that cause occupational stress include: poor organization and working conditions, work overload, shortage of time, duration of working day (overtime hours), labor content, low social status, low wages and poor career prospects, professional responsibility, unnecessary rituals and procedures, monotony, a large number of people with many problems, the depth of contact with another person, participation in decision-making, feedback. That is, organizational factors are considered as the most important characteristic in the labor system as a whole, which includes working conditions and the content of work, etc. In addition, the information workload is a serious stressor [30]. Factors of work can cause direct stress reactions, which can, in turn, lead to serious physical and mental illness, industrial conflicts and violation of work efficiency. However, according to Smith M.J., both stress production of the factors of the labor system and stress responses, as well as the effects of stress can be regulated by individual characteristics of the subject. Smith M.J. attributes personal and cognitive characteristics, health status and available knowledge, skills and experience to these characteristics. Thus, the personal characteristics of the subject include such resources as self-realization, which implies personal openness, interpersonal communication, trust and willingness to be close to others, self-centeredness, i.e. the degree of individual respect for themselves and their judgments, self-confidence, acceptance of themselves and others - the ability to accept the originality, imperfection of other people and themselves. Resources that reflect structuring skills (the ability to plan, set goals and priorities) and problem-solving skills can be referred to cognitive characteristics. Characteristics of health status include a resource of physical health, which comprises the absence of chronic diseases and incapacity, and, according to Smith M.J., such resources as physical endurance, stress control and exertion control - to the category of knowledge, skills and experience, which allow the subject to do conscious activity to reduce stress reactions. In addition, within the situational approach, overcoming occupational stress will be more successful if the individual’s reactions meet the requirements of the situation [22]. The perception of control over the situation is one of important parameters of the potential correspondence between overcoming and situational assessments of occupational stress. When exercising control over the situation, problem-oriented strategies for overcoming stress are the most effective, as well as those with great use of active work strategies, aimed at a positive solution. The locus of control also refers to the individual characteristics of the subject within this model of occupational stress by Smith M.J. [15]. Nowadays, according to most scientists, chronic occupational stress is an important factor in shaping the phenomenon of burnout (the term was introduced by H. Freudenberg, 1974), the concept of “occupational exhaustion” is expressed as a result of work of high intensity and tension, which can result in psychosomatic pathology, neurological and borderline mental disorders, arterial hypertension, etc. [13].
Early Revealing of Professional Burnout Predictors
469
A large number of studies have shown that the occurrence of burnout depends on the interaction of basic characteristics, namely, personal characteristics, Job characteristics, Organizational Communication, Engagement with the organization (Fig. 3).
Fig. 3. Four clusters of antecedents of job burnout
The first group includes personal characteristics of employees, demographic indices (such as age and gender), as well as more specific life events. It is known that personality traits such as authoritarianism, low empathy, neuroticism, repression, weak tendency to self-actualization and self-realization, emotional rigidity, weak motivation and disorientation of personality and anxiety, aggression can contribute to burnout symptoms, although their contribution to the development of burnout is not equal in size as the professional factors that indicate that burnout is a social rather than an individual phenomenon (Maslach et al., 2001, p. 409). At the same time, the reasons for the burnout of a professional, most likely, may be a combination of specific factors of the organizational environment in combination with the personal characteristics of the employee, which, in turn, can smooth or enhance the impact of professional stressors [20]. The group of professional and organizational factors of work may include unsatisfactory working conditions, high workload, uncertainty in the assessment of work performed, increased responsibility, psychologically difficult contingent, lack of remuneration. An improperly organized mechanism of communication between employees can also contribute to burnout, as information overload in the absence of feedback from the administration, interpersonal communication and unsatisfactory
470
I. Zavgorodnii et al.
working psychological climate, which are elements of the work environment, potentiate psychological work distress as well. This category also includes such indices as professional attitudes and commitment to work as predictors of professional burnout [23,25]. In 2010, at the 307th session, the Governing Body of the International Labor Organization (ILO) approved a new list of occupational diseases. For the first time, the ILO List included psychoemotional and behavioral disorders, with evidence of a direct link between the action of this factor and psychoemotional or behavioral disorders that have developed in the employee [2]. Occupational burnout syndrome is included in the 11th revised version of the International Classification of Diseases (ICD 11) dated 28.05.2019. This syndrome is classified as “Factors affecting the health of the population and appeals to health care facilities”, which includes reasons for the population’s appeal to health care institutions, not classified as diseases or medical states [1]. The Center for Disease Control (CDC, Atlanta, United States) and the Prevention National Institute for Occupational Safety (NIOS) annually revises and supplements the list of professional specialties to control the occurrence of chronic fatigue syndrome (“burnout”) and measures to prevent it (CDC, NIOS, 2019) [4]. It is known that the frequency of burnout among workers of “helping”, socially significant occupations, in particular, emergency care workers, is high due to the significant involvement of the specialist in interpersonal communication, constant ability to sympathize and understand the problems of others. In the genesis of such a pathological condition, the ratio of the intensity of labor process factors and individual personality characteristics is important, the presence of appropriate professionally significant characteristics and abilities that ensure the stability of the individual to the constant influence of production factors [8]. Health care workers in general, especially primary emergency medical care workers, taking into account the specifics of their daily practice, are most exposed to occupational stressors. The characteristic features of the professional activity for doctors of the emergency medical center are the intense rhythm of work, which is due to the large number of patients, high intellectual and emotional loads, inadequate management methods and constant structural reorganizations, high paperwork with subsequent duplication of data in electronic programs, lack of state support with adequate logistics and medicines and insufficient ability to influence working conditions [28]. A more differentiated analysis of the burnout process reveals its various components, which remain hidden in the diagnosis of the condition based on symptoms. Therefore, it is appropriate to understand burnout as the compliance of coping mechanisms with the requirements of work. This can be reliably done with help of the questionnaire Arbeitsbezogenes Verhaltens- und Erlebensmuster (AVEM) developed by W. Schaarschmidt and A. Fischer, which conveys
Early Revealing of Professional Burnout Predictors
471
the human reaction to the requirements of the professional environment and behaviors based on these reactions, as well as the severity of burnout [24]. An accurate characterization of strategies, techniques, organizational processes to solve the problem situation and, above all, involvement of individual personal resources in solving problem situations, form a crucial basis for effective prevention of professional distress, as it is possible to show not only weaknesses but still existing resources of personalities that can be used. To counteract the development of burnout, it is especially important to reveal personal resources of productivity and strengthen the resources of resistance. If there is a “burnout”, help and protection are useful to be able to build up resources again. Restoration of health requires the development of active behavior in accordance with changing professional requirements [27]. Diagnosis and detection of work-related experiences in emergency care workers is the key not only for the effective work of emergency physicians, but also for harmony of the specialist’s personality. The Arbeitsbezogenes Verhaltens- und Erlebensmuster (AVEM) questionnaire is a multifactor diagnostic tool that allows you to determine the types of human behavior in situations where they have professional requirements [29]. According to the authors’ concept, professional behavior is determined by three main factors (Table 1): 1. Professional activity - a person’s readiness for energy expenditure in professional activities and the factors that determine it; 2. Strategies for overcoming problem situations - active problem solving or avoiding; 3. Emotional mood for professional activity - a person’s attitude to professional activity, based on a sense of professional success and life satisfaction. The questionnaire consists of 66 statements combined into 11 scales. Each scale consists of 6 statements, the degree of agreement with which the subject evaluates by a five-point scale from “Completely agree” - 5 points to “Absolutely disagree” - 1 point. Depending on the ratio of indices by different scales, the type of behavior in the professional environment is determined, which allows, along with other characteristics, to make a conclusion about the presence or absence of the syndrome of professional “burnout”. The authors of the technique distinguish the following types of professional behavior: 1. Healthy type “G” - characterized by high, but not extreme professional activity; tends to constructive solving problem situations in the professional sphere and considers them not as a source of stress, but as an incentive to active overcoming obstacles. A positive attitude to professional activity is enhanced by the mobilizing effect of positive emotions. 2. Economical type “S” - has an average level of professional motivation and activity and high satisfaction with professional results, usually not very high, and, especially, life in general. In relation to work, he tends to keep distance. In the long run, the economic type is likely to increase professional dissatisfaction against the background of other people’s success.
472
I. Zavgorodnii et al. Table 1. 11 scales and 3 main factors of AVEM questionnaire 1
Subjective significance of activity
BA Professional activity
2
Professional claims striving for professional growth
BE
3
Energy readiness - willingness to devote all their strength to the performance of professional tasks
VB
4
The pursuit of excellence focus on the quality of the performed duties
PS
5
Ability to maintain distance from work the ability to relax and rest after work
DF
6
The tendency to give up in a situation of failure a tendency to come to terms with the situation of failure and easily refuse to overcome it
RT Mental stability and problem overcoming strategies
7
Proactive problem solving strategy - active and optimistic attitude OP to emerging problems and tasks
8
Inner calmness and balance - feeling of mental stability and balance
IR
9
Feeling successful in professional activities satisfaction with their professional achievements
EE Emotional attitude to work
10 Life satisfaction general life satisfaction based on professional success
LZ
11 Feeling socially supported - trust and support from loved ones, a sense of social well-being
SU
3. Type of risk “A” - is characterized by extremely high subjective significance of professional activity, a high degree of readiness for energy costs. This causes the ease of dissatisfaction with professional results, low tolerance to stressors and a high probability of mental overload and development of the syndrome of professional “burnout”. In addition, this type is characterized by the lack or insufficiency of social support, which, according to the authors, in combination with high professional activity, is one of the causes of psychosomatic diseases. 4. Type of “burnout” “B” - has a low level of significance of professional activity in combination with low stress resistance and the ability to relax, the tendency to move away from problematic situations. The inability to maintain distance from work is a characteristic feature of this type. These properties lead to emotional exhaustion and characterize professional “burnout”. Score interpretation of results by the method of “AVEM”: 6–10 particularly low values; 11–15 low values; 16–20 average values; 21–25 high values, 26–30 extremely high values.
Early Revealing of Professional Burnout Predictors
4 4.1
473
Experiment, Results and Discussion Experimental Datasets. Simulation Results
In our research we examined 85 ambulance workers of both sexes, different ages (from 20 to 78 years), and experience (from 1 month to 48 years). For Step 3 in this case the following method was adopted. The total scores for each of 11 scales were compared with their standard values for 4 types individually. The most suitable (the closest) types were chosen for each of 11 scales. The evaluation process can be described in more details as follows. Step 1. Raw points for each of 11 scales are computed independently. Items with positive statements are added, and items with negative statements (which are 13, 16, 19, 22, 23, 30, 31, 33, 49, 54, 55, 56, and 60) are subtracted. After that 6 points should be added for each negative statement in this scale. For example, in scale IR (see Table 1) 4 items (8, 41, 52, and 63) contain positive statements and 2 items (19 and 30) contain negative statements [24]. Thus, 2·6 = 12 points should be added to the total result. Then the total amount of points for scale IR will be the following: P8 – P19 – P30 + P41 + P52 + P63 + 12, where P8 denotes points that the subject evaluated statement 8, etc. Step 2. Total points for each scale are converted to stanine (nine-point standard scale with a mean of five and a standard deviation of two). This step is not obligate and can be omitted. Step 3. One of the four types of professional behavior is assigned to the particular subject based on their scores in each of 11 scales. Thus, each ambulance workers was assigned a list of 11 types. The method of the type determination is the most subtle and arguable point in the whole process. The most frequent burnout type was chosen as a final decision (Fig. 4).
Fig. 4. Result
474
I. Zavgorodnii et al.
Data visualization in the space of three principal components is presented in Fig. 5. The most part of ambulance workers belongs to Economical type “S” (marked by green circles), 3 objects - to type of risk “A” (marked by red squares), 9 objects - to type of “burnout” “B” (marked by black squares) and only one ambulance workers belongs to the group of healthy type “G” (marked by star). One of the four types of professional behavior (AVEM scales) is assigned to the particular subject based on the most common value on each of the 11 scales. But, for example, patient with ID RD084 has the total result equals to economical type “S” and type of “burnout” “B” at the same time (see Fig. 6). So, it is necessary to introduce a procedure for refining the calculation of professional behavior types. The procedure consists of few steps. The first is detecting centers for each of groups (S, G, B and A) as the average value for the group. All 4 centers marked by “X” sign in Fig. 7. For early revealing of professional burnout it is necessary to analyze objects that are close to center of B and A groups in the context of Manhattan distance: dist(x) =
n
|xi − cj |,
(1)
i=1
After that the physician should order additional tests of this particular group and carry out an additional measures aimed at preventing the development of professional burnout. 4.2
Discussion
It is necessary to note, that in case when the total result of detection professional behavior type is equal to two (or three) AVEM types at the same time, a special procedure of introducing weight coefficients for each of 11 scales should be used. Weight coefficients must be different for different groups of employees. It is necessary to note that those 4 standard types of AVEM-scale were initially derived from the studies conducted on teachers [14]. Since for other occupations the types distribution might be different.
Early Revealing of Professional Burnout Predictors
Fig. 5. Data visualization
Fig. 6. Undefined result example
475
476
I. Zavgorodnii et al.
Fig. 7. Visualization of cluster’s centers
5
Conclusions
This paper aims to early revealing of professional burnout predictors. As initial data the responses to the “Work-related behavior” (AVEM) questionnaire of 85 emergency care workers were used. The most part (more than 80%) of emergency care workers has an average level of professional motivation and activity and high satisfaction with professional results. The principal goal was an identification of people who are prone to the development of professional burnout. It will allow us to improve the calculations of the AVEM questionnaire by introducing weight coefficients for each of 11 scales and establish professionally forming criteria for professional selection of emergency care workers and other types of socially significant profession. Acknowledgment. Research on the topic “Substantiation of criteria of prepathological states of occupational burnout in health care workers”, under registration, was financed by the Ministry of Health of Ukraine at the expense of the state budget.
Early Revealing of Professional Burnout Predictors
477
References 1. Bulletin of the world health organization 2. Minutes of the 307th session of the govering body of the international labor office (2010). www.ilo.org/wcmsp5/groups/public/--ed norm/--relconf/ documents/meetingdocument/wcms 142975.pdf 3. Workplace stress: a collective challenge. International Labour Organization, p. 63 (2016) 4. National Institute for Occupational Safety and Health. Centers for disease control. Total worker health workforce development program (2020). https://www.cdc.gov/ niosh/twh/default.html 5. Carayon, P., Smith, M., Haims, M.: Work organization, job stress, and work-related musculoskeletal disorders. Hum. Factors 41(4), 644–663 (1999). https://doi.org/ 10.1518/001872099779656743 6. Agarwal, S., Pabo, E., Rozenblum, R., Sherritt, K.: Professional dissonance and burnout in primary care: a qualitative study. JAMA Intern. Med. 180(3), 395–401 (2020). https://doi.org/10.1001/jamainternmed.2019.6326 7. Basu, S., Qayyum, H., Mason, S.: Occupational stress in the ED: a systematic literature review. Emerg. Med. J. 34(7), 441–447 (2016). https://doi.org/10.1136/ emermed-2016-205827 8. Bergm¨ uller, A., Zavgorodny, I., Zavgorodnyaya, N., Kapustnik, V., B¨ ockelmann, I.: The correlation between personality characteristics and burnout syndrome in emergency ambulance workers. Zhurnal Nevrologii i Psikhiatrii imeni S.S. Korsakova 116(12), 25–29 (2016). https://doi.org/10.17116/jnevro201611612125-29 9. Bridgeman, P., Bridgeman, M., Barone, J.: Burnout syndrome among healthcare professionals. Am. J. Health-Syst. Pharm. 75(3), 147–152 (2018). https://doi.org/ 10.2146/ajhp170460 10. Corso, G., Veronesi, P., Pravettoni, G.: Preventing physician distress: burnout syndrome, a sneaky disease. Eur. J. Cancer Prev. 28(6), 568 (2019). https://doi. org/10.1097/CEJ.0000000000000499 11. D’Onofrio, L.: Physician burnout in the united states: a call to action. Altern. Ther. Health Med. 25(2), 8–10 (2019) 12. Fortes, A., Tian, L., Huebner, E.: Occupational stress and employees complete mental health: a cross-cultural empirical study. Int. J. Environ. Res. Public Health 17(10), 3629 (2020). https://doi.org/10.3390/ijerph17103629 13. Freudenberger, H.: Burn-Out: The High Cost of High Achievement, p. 214. Archor Press (1980) 14. Gencer, R., Boyacioglu, H., Kiremitci, O., Dogan, B.: Psychometric properties of work-related behavior and experience patterns (AVEM) scale. H. U. J. Educ. 38, 138–139 (2010) 15. Gragnano, A., Simbula, S., Miglioretti, M.: Work-life balance: weighing the importance of work-family and work-health balance. Int. J. Environ. Res. Public Health 17(3), 907 (2020). https://doi.org/10.3390/ijerph17030907 16. Gregory, S., Menser, T., Gregory, B.: An organizational intervention to reduce physician burnout. J. Healthcare Manag. 63(5), 338–352 (2018) 17. Kahn, R., Wolte, D., Qunn, R., Snoek, J., Rosental, R.: Organization Stress Studies in Role Conflict and Ambiguity, 2nd edn., p. 470. Wiley, New York (1964) 18. Katz, D., Kahn, R.: The Social Psychology of Organizations, p. 489. Wiley, New York (1966)
478
I. Zavgorodnii et al.
19. Leszczynski, P., et al.: Determinants of occupational burnout among employees of the emergency medical services in Poland. Ann. Agric. Environ. Med. 26(1), 114–119 (2019). https://doi.org/10.26444/aaem/94294 20. Maslach, C.: Burnout in health professionals. In: Cambridge Handbook of Psychology, Health and Medicine, p. 968. Cambridge University Press (2007) 21. McGrath, J.: A conceptual formulation for research on stress. In: Social and Psychological Factors in Stress, pp. 10–21. Holt, Rinehart, Winston, New York (1970) 22. Pourhoseinzadeh, M., Gheibizadeh, M., Moradika, M., Cheraghian, B.: The relationship between health locus of control and health behaviors in emergency medicine personnel. Int. J. Community Based Nurs. Midwifery 5(4), 397–407 (2017) 23. Roy, I.: Burnout syndrome: definition, typology and management. Soins Psychiatrie 39, 12–19 (2018) 24. Schaarschmidt, U.: Avem: Ein instrument zur interventionsbezogenen diagnostic beruichen bewaltigungsverhaltens. Weichenstellung fur den Reha-Verlauf 7, 59–82 (2006) 25. Schaufeli, W., Leiter, M., Maslach, C.: Burnout: 35 years of research and practice. Career Dev. Int. 14(3), 204–220 (2009) 26. Smith, M., Carayon, P.: New technology, automation, and work organization: stress problems and improved technology implementation strategies. Int. J. Hum. Factors Manuf. 5(1), 99–116 (1995). https://doi.org/10.1002/hfm.453005010. Special Issue: Human Factors in Automation: Implications for Manufacturing 27. Somville, F., De Gucht, V., Maes, S.: The impact of occupational hazards and traumatic events among Belgian emergency physicians. Scand. J. Trauma Resuscitation Emerg. Med. 24, 249–251 (2016). https://doi.org/10.1186/s13049-016-0249-9 28. Stehman, C., Testo, Z., Gershaw, R., Kellogg, A.: Burnout, drop out, suicide: physician loss in emergency medicine. Western J. Emerg. Med. 20(3), 485–494 (2019). https://doi.org/10.5811/westjem.2019.4.40970 29. Voltmer, E., Spahn, C., Schaarschmidt, U., Kieschke, U.: Work-related behavior and experience patterns of entrepreneurs compared to teachers and physicians. Int. Arch. Occup. Environ. Health 84(5), 479–490 (2011). https://doi.org/10.1007/ s00420-011-0632-9 30. van der Wal, R., Wallage, J., Bucx, M.: Occupational stress, burnout and personality in anesthesiologists. Curr. Opin. Anesthesiol. 31(3), 351–356 (2018). https:// doi.org/10.1097/ACO.0000000000000587 31. Wang, Y., Wang, P.: Perceived stress and psychological distress among Chinese physicians: the mediating role of coping style. Med. (Baltimore) 98(23), e15950 (2019). https://doi.org/10.1097/MD.0000000000015950
Forming Predictive Features of Tweets for Decision-Making Support Bohdan M. Pavlyshenko(B) Ivan Franko National University of Lviv, Lviv, Ukraine
Abstract. The article describes the approaches for forming different predictive features of tweet data sets and using them in the predictive analysis for decision-making support. The graph theory as well as frequent itemsets and association rules theory is used for forming and retrieving different features from these datasets. The use of these approaches makes it possible to reveal a semantic structure in tweets related to a specified entity. It is shown that quantitative characteristics of semantic frequent itemsets can be used in predictive regression models with specified target variables.
Keywords: Predictive features itemsets · Tweets
1
· Predictive analytics · Frequent
Introduction
Tweets, the messages of Twitter microblogs, have high density of semantically important keywords. It makes it possible to get semantically important information from the tweets and generate the features of predictive models for the decision-making support. Different studies of Twitter are considered in the papers [3–6,9,15,17,19,23,31,34]. In [27,28], we study the use of tweet features for forecasting different kinds of events. In [25], we study the modeling of COVID19 spread and its impact on the stock market using different types of data as well as consider the features of tweets related to COVID-19 pandemic. In this paper, we study the predictive features of tweets using loaded datasets of tweets related to Tesla company.
2
Graph Structure of Tweets
The relationships among users can be considered as a graph, where vertices denote users and edges denote their connections. Using graph mining algorithms, one can detect user communities and find ordered lists of users by various characteristics, such as Hub, Authority, PageRank, Betweenness. To identify user communities, we used the Community Walktrap Algorithm algorithm, which is implemented in the c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 479–490, 2022. https://doi.org/10.1007/978-3-030-82014-5_32
480
B. M. Pavlyshenko
Fig. 1. Revealed users’ communities for the subset of tweets
Fig. 2. Subgraph for users of highly isolated communities
Forming Predictive Features of Tweets for Decision-Making Support
481
package igraph [11] for the R programming language environment. We used the Fruchterman-Reingold algorithm from this package for visualization. The Community Walktrap algorithm searches for related subgraphs, also called communities, by random walk [30]. A graph which shows the relationships between users can be represented by Fruchterman-Reingold algorithm [12]. We can assume that tweets could carry predictive information for different business processes. For our case study, we have loaded the tweets related to Tesla company for some time period. Qualitative structure can be used for aggregating different quantitative time series and, in such a way, creating new features for predictive models which can be used, for example, for stock prices forecasting. Let us consider which features we can retrieve from tweet sets for the predictive analytics. Figure 1 shows revealed users’ communities for the subset of tweets. Figure 2 shows the subgraph for users of highly isolated communities. Revealing users’ communities makes it possible to analyze different trends in tweet streams which are forming by different users’ groups.
3
Analysis of Tweets Using Frequent Itemsets
The frequent set and associative rules theory is often used in the intelectual analysis [1,2,7,10,14,16,24,32]. It can be used in a text data analysis to identify and analyze certain sets of objects, which are often found in large arrays and are characterized by certain features. Let’s consider the algorithms for detecting frequent sets and associative rules on the example of processing microblog messages on Twitter. We can specify a thematic field which is a set of keywords semantically related to domain area under study. Figure 3 shows the frequencies of keywords for the thematic field of frequent itemsets analysis. This will make it possible to narrow the semantic analysis of messages to the given thematic framework. Based on the obtained frequent semantic sets, we are going to analyze possible associative rules that reflect the internal semantic connections of thematic concepts in messages. In the time period when tweet dataset was being loaded, the accident with solar panels manufactured by Tesla on Walmart stores roofs took place. It is important to consider the reflection of trends related to this topic in various processes, in particular, the dynamics of the company’s stock prices in the financial market. Using frequent itemsets and association rules, we can find a semantic structure in specified semantic fields of lexemes. Figures 4, 5 shows semantic frequent itemesets for specified topics related to Tesla company. Figures 6, 7 show association rules represented by graph and by grouped matrix. Figure 8 shows sentiment and personality analytics characteristics received using IBM Watson Personality Insights [20].
482
B. M. Pavlyshenko
Fig. 3. Keyword frequencies for the thematic field of frequent itemset analysis
Fig. 4. Semantic frequent itemsets
Forming Predictive Features of Tweets for Decision-Making Support
483
Fig. 5. Semantic frequent itemsets
Fig. 6. Associative rules, represented by a graph
4
Predictive Analytics Using Tweet Features
Using revealed users’ graph structure, semantic structure and topic related keywords and hashtags, one can receive keyword time series for tweet counts per day. These time series can be considered as features in the predictive models.
484
B. M. Pavlyshenko
Fig. 7. Associative rules represented by a grouped matrix
Fig. 8. Sentiment and personality analytics characteristics
In some time series, we can see when exactly the accident with solar panels on Walmart roof appeared and how long it was being considered in Twitter. Figure 9 shows the time series for different keywords and hashtags in the tweets. Figure 10 shows normalized keywords time series. Social networks influence the formation of investment sentiment of potential stock market participants. Let us consider the dynamics of shares of the Tesla company in the time period of the incident with solar panels manufactured by Tesla. It is reflected in the keywords
Forming Predictive Features of Tweets for Decision-Making Support
485
Fig. 9. Time series for different keywords and hashtags in the tweets
time series on Fig. 9. One can see that at the time of the Tesla solar panel incident, the tweet activity is increasing over the time series of some keywords. Let us analyze how this incident affects the share price of Tesla. A linear model was created, where time series of keywords and their time-shifted values (lags) were considered as independent regression variables. As a target variable, we considered the time series of the relative change in price during the day (price return). Using LASSO regression, weights were found for the analyzed traits. Figure 11 shows the dynamics of the stock price Tesla (TSLA ticker) in the stock market. We created a linear model where keyword time series and their lagged values were considered as covariates. As a target variable, we considered stock price return time series for ticker TSLA. Using LASSO regression, we found weight coefficients for the features under consideration. Figure 12 shows the stock price return and predicted values. Figure 13 shows the regression coefficients for the chosen features in the predictive model. We also conducted regression using Bayesian inference. Bayesian approach makes it possible to calculate the distributions for model parameters and for the target variable that is important for risk assessments [8,13,18]. Bayesian inference also makes it possible to take into account non-Gaussian distribution of target variables that take place in many cases for financial time series. In [26], we considered different approaches of using Bayesian models for time series. Figure 14 shows the boxplots for feature coefficients in Bayesian regression model.
486
B. M. Pavlyshenko
Fig. 10. Normalized keyword time series
Fig. 11. Dynamics of the stock price of Tesla (TSLA ticker)
5
Q-Learning Using Tweet Features
It is interesting to use Q-learning to find an optimal trading strategy. Q-learning is an approach based on the Bellman equation [21,22,33]. In [29], we considered different approaches for sales time series analytics using deep Q-learning. Let us consider a simple trading strategy for the stocks with ticker TSLA. In the simplest case of using deep Q-learning, we can apply three actions ‘buy’, ‘sell’, ‘hold’. For state features, we used keyword time series. As a reward, we used stock price return. The environment for learning agent was modeled using keywords and reward time series. Figure 15 shows the price return for the episodes
Forming Predictive Features of Tweets for Decision-Making Support
487
Fig. 12. Stock price return real and predicted values
Fig. 13. Regression coefficients for the chosen features in the predictive model
for learning agent iterations. The results show that an intelligent agent can find the an optimal profitable strategy. Of course, this is a very simplified case of analysis, where the effect of overfitting may occur, so this approach requires further study. The main goal is to show that, using reinforced learning and an environment model based on historical financial data and quantitative characteristics of tweets, it is possible to build a model in which an intelligent agent can find an optimal strategy that optimizes the reward function in episodes of interaction of learning agent with the environment. It was shown that time series of keywords features can be used as predictive features for different predictive analytics problems. Using Bayesian regression and tweets quantitative features one can estimate an uncertainty for the target variable that is important for the decision making support.
488
B. M. Pavlyshenko
Fig. 14. Boxplots for feature coefficients in Bayesian regression model
Fig. 15. Price return for the episodes for learning agent iterations
6
Conclusion
Using the graph theory, the users’ communities and influencers can be revealed given tweets characteristics. The analysis of tweets, related to specified area, was carried out using frequent itemsets and association rules. Found frequent itemsets and association rules reveal the semantic structure of tweets related to a specified area. The quantitative characteristics of frequent itemsets and association rules, e.g. value of support, can be used as features in regression models. Bayesian regression make it possible to assess the uncertainty of tweet features and target variable. It is shown that tweet features can also be used in deep Q-learning for forming the optimal strategy of learning agent e.g. in the study of optimal trading strategies on the stock market.
Forming Predictive Features of Tweets for Decision-Making Support
489
References 1. Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I., et al.: Fast discovery of association rules. Adv. Knowl. Discov. Data Min. 12(1), 307–328 (1996) 2. Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994) 3. Asur, S., Huberman, B.A.: Predicting the future with social media. In: 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 1, pp. 492–499. IEEE (2010) 4. Balakrishnan, V., Khan, S., Arabnia, H.R.: Improving cyberbullying detection using twitter users’ psychological features and machine learning. Comput. Secur. 90, 101710 (2020) 5. Benevenuto, F., Rodrigues, T., Cha, M., Almeida, V.: Characterizing user behavior in online social networks. In: Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement, pp. 49–62 (2009) 6. Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. J. Comput. Sci. 2(1), 1–8 (2011) 7. Brin, S., Motwani, R., Silverstein, C.: Beyond market baskets: generalizing association rules to correlations. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, pp. 265–276 (1997) 8. Carpenter, B., et al.: Stan: a probabilistic programming language. J. Stat. Softw. 76(1) (2017) 9. Cha, M., Haddadi, H., Benevenuto, F., Gummadi, K.: Measuring user influence in Twitter: the million follower fallacy. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 4 (2010) 10. Chui, C.K., Kao, B., Hung, E.: Mining frequent itemsets from uncertain data. In: Zhou, Z.H., Li, H., Yang, Q. (eds.) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. LNCS, vol. 4426, pp. 47–58. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71701-0 8 11. Csardi, G., Nepusz, T., et al.: The igraph software package for complex network research. InterJ. Complex Syst. 1695(5), 1–9 (2006) 12. Fruchterman, T.M., Reingold, E.M.: Graph drawing by force-directed placement. Softw. Pract. Exp. 21(11), 1129–1164 (1991) 13. Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.: Bayesian Data Analysis. Chapman and Hall/CRC (2013) 14. Gouda, K., Zaki, M.J.: Efficiently mining maximal frequent itemsets. In: Proceedings 2001 IEEE International Conference on Data Mining, pp. 163–170. IEEE (2001) 15. Java, A., Song, X., Finin, T., Tseng, B.: Why we Twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNAKDD 2007 Workshop on Web Mining and social Network Analysis, pp. 56–65 (2007) 16. Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding interesting rules from large sets of discovered association rules. In: Proceedings of the Third International Conference on Information and Knowledge Management, pp. 401–407 (1994) 17. Kraaijeveld, O., De Smedt, J.: The predictive power of public Twitter sentiment for forecasting cryptocurrency prices. J. Int. Financ. Markets Inst. Money 65, 101188 (2020)
490
B. M. Pavlyshenko
18. Kruschke, J.: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan. Academic Press, Cambridge (2014) 19. Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web, pp. 591–600 (2010) 20. Mahmud, J.: IBM Watson personality insights: the science behind the service. Technical report, IBM (2016) 21. Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013) 22. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015) 23. Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In: LREc, vol. 10, pp. 1320–1326 (2010) 24. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Buneman, P. (eds.) Database Theory – ICDT 1999. ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-49257-7 25 25. Pavlyshenko, B.M.: Modeling COVID-19 spread and its impact on stock market using different types of data. Electron. Inf. Technol. (14), 3–21 (2020) 26. Pavlyshenko, B.: Bayesian regression approach for building and stacking predictive models in time series analytics. In: Babichev, S., Peleshko, D., Vynokurova, O. (eds.) DSMP 2020. CCIS, vol. 1158, pp. 486–500. Springer, Cham (2020). https:// doi.org/10.1007/978-3-030-61656-4 33 27. Pavlyshenko, B.M.: Forecasting of events by tweets data mining. Electron. Inf. Technol. (10), 71–85 (2018) 28. Pavlyshenko, B.M.: Can Twitter predict royal baby’s name ? Electron. Inf. Technol. (11), 52–60 (2019) 29. Pavlyshenko, B.M.: Sales time series analytics using deep q-learning. Int. J. Comput. 19(3), 434–441 (2020). https://computingonline.net/computing/article/view/ 1892 30. Pons, P., Latapy, M.: Computing communities in large networks using random ¨ walks. In: Yolum, G¨ ung¨ or, T., G¨ urgen, F., Ozturan, C. (eds.) Computer and Information Sciences - ISCIS 2005. ISCIS 2005. LNCS, vol. 3733, pp. 284–293. Springer, Heidelberg (2005). https://doi.org/10.1007/11569596 31 31. Shamma, D., Kennedy, L., Churchill, E.: Tweetgeist: can the Twitter timeline reveal the structure of broadcast events. CSCW Horiz. 589–593 (2010) 32. Srikant, R., Vu, Q., Agrawal, R.: Mining association rules with item constraints. In: KDD, vol. 97, pp. 67–73 (1997) 33. Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, vol. 2. MIT Press, Cambridge (1998) 34. Wang, M., Hu, G.: A novel method for Twitter sentiment analysis based on attentional-graph neural network. Information 11(2), 92 (2020)
Method for Adaptive Semantic Testing of Educational Materials Level of Knowledge Olexander Mazurets1 , Olexander Barmak1 , Iurii Krak2,3(B) , Eduard Manziuk1 , and Ruslan Bahrii1 1 3
Khmelnytskyi National University, Khmelnytskyi, Ukraine 2 Glushkov Cybernetics Institute, Kyiv, Ukraine Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
Abstract. The article considers the method that allows to calculate the estimation of knowledge level of educational materials by using indicators of semantic importance of key terms for adaptive selection of test tasks in testing. Each test task allows to purposefully check the level of knowledge of separate semantic units of educational materials – semantic terms (words, phrases). It is assumed that increasing the depth of learning of educational material semantic content has the effect of learning less semantically important units of educational materials. Semantic terms are related to the semantic structure of educational material in the form of rubricational system. Test tasks are related to fragments of educational material content, the knowledge level of which they test. In the process of testing, the level of knowledge of each of semantic structure elements of the educational material is consistently adaptively determined, and the final grade is calculated based on these results. As a result of adaptive selection of test tasks, their maximum possible diversity in the final set is reached, because following criteria are considered: test task has not been used yet, relevant fragment of educational content has not been checked, test tasks of corresponding type were used the least, the test task does not contain less important semantic units. The developed method of adaptive semantic testing of knowledge level of educational materials makes possible to use different algorithms for starting testing (regressive, progressive, medianic, etc.) and different algorithms for knowledge level estimation (average, absolute limit, etc.). Applied investigations of the effectiveness of the developed method in comparison with the traditional algorithm for selecting test tasks established, that testing speed increased an average of 20.53% faster test, and to determine the level of knowledge required the use of an average of 19.33% fewer test tasks. Keywords: Testing · Level of knowledge · Adaptive testing terms · Keywords · Key phrases · Educational materials
· Key
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 491–506, 2022. https://doi.org/10.1007/978-3-030-82014-5_33
492
1
O. Mazurets et al.
Introduction and Literature Review
One of the most important indicators of a high-quality educational model is the control of knowledge level [8,23]. The control of knowledge level should be based on the content and material of the educational course in which it is conducted. The basis of the course is educational material that reveals its issues and is the basis for the formation of knowledge. The educational material is complex system that has its own structure with specific elements and relations between them [1,7]. As the basis of the learning process, the educational material includes all the information that is submitted for learning and promotes learning. The educational material is considered as the set of two pieces of information: basic and auxiliary. The ultimate goal of presenting basic information is to turn it into knowledge or skills. Auxiliary information aims to ensure the reliability of the learning of basic information [19,22,24]. The most objective means for estimation of the knowledge level is currently considered testing, which allows to impartially estimating the academic achievements of students. Computer testing makes it possible to implement the basic didactic provisions of learning control: the principle of individual nature of testing and estimation of knowledge; systematic testing and estimation of knowledge; the principle of subjectivity; the rule of differentiated estimation of progress [6]. At the present stage, under conditions of quarantine restrictions and anti-epidemiological measures, distance computer testing and adaptive technologies of distance education become especially important in the learning process. Computer testing can be performed in various forms, differing in the technology of combining tasks into a test [5]: – traditional testing is a random selection to the working sample of a set of test tasks, fixed by the number or estimated time of execution; – parameterized testing is author-created templates of test tasks, in which some elements can be changed (parameters); – adaptive testing, in which the composition of the working sample of test tasks is unknown in advance, and subsequent test tasks are selected automatically depending on the answers to previous. Traditional testing is the easiest to implement, allows to see the dynamics of testing and does not set significant requirements for the number of test tasks in the base set. However, there are the following problems of computer testing in the traditional form [5,9]: – the set of test tasks received by the user may not completely cover the semantic structure of the educational material; – for fully cover the semantic structure of the educational material requires a large number of test tasks; – the estimation does not take into account the growing level of complexity of test tasks due to the use of terms of different levels of semantic significance in the content; – fixed amount of test tasks is used regardless of the success of the testing process.
Method for Adaptive Semantic Testing
493
The disadvantage of parameterized tests is the labor intensity of manually forming of many templates of test tasks. The advantage of the approach is the ability to create a large number of test tasks by a small number of templates [9]. Adaptive testing allows to accurately determine the level of learning of material, avoids the use of excessive number of test tasks, but requires a very large number of test tasks in the base set. Also, adaptive testing requires the mandatory definition of a number of parameters of each test task (complexity, semantic relation with the educational material) for its adaptive use [21]. The field of application of parameterized tests is mainly automation of formation of mathematical tasks. However, much of the content of many educational courses contains mostly textual content, which is characterized by the consistency and semantic coherence of the presentation. Due to the above reasons, adaptive testing is more effective than the traditional approach of random selection of test tasks to check the level of educational material. There are several approaches to adaptive testing, which involve adaptive selection of test tasks according to the parameters of expert assessment of their complexity, including the cross-section of structural elements of educational material, such as paragraphs. Problems of existing approaches to adaptive testing are [9,13]: – need to create a large number of test tasks that fully cover the educational material; – selection of the criterion for assessing the complexity of test tasks and its calculation for each test task; – calculation for each test task of a number of parameters, including parameters of semantic connection with educational material; – a variety of basic and working sets of test tasks on a number of parameters (type of test task, a fragment of content that is checked, etc.); – providing semantic coverage of educational material in the testing process. At the present stage, it is considered promising to create the required set of test tasks automatically [11]. In addition to the use of parameterized tests, there are known methods of generating test tasks by the conceptual-thesis model, by the ontology of the subject area, by the formalization of structured text statements. In general, the existing means of automating the creation of test tasks are focused on methods of artificial intelligence using the theory of ontologies, which makes them cumbersome and inefficient [10]. However, it is known that increasing the depth of learning the semantic content of educational material has the effect of learning less and less semantically important units of educational materials. Semantic units of educational materials are key terms (keywords, key phrases, abbreviations, etc.), which have different indicators of semantic weight or importance [16]. Semantic terms are the lower level of the semantic structure of educational material in the form of a system of rubrication. Therefore, each test task should purposefully check the level of knowledge of specific semantic units of educational materials for a specific fragment of educational material. Depending on the success of the answers, test tasks are offered to check the assimilation of more or less semantically important units of educational material
494
O. Mazurets et al.
to determine the level of learning of the content of each element of the rubrication of educational material. Based on these results, the final score for the test is calculated. This article shows the approach to adaptive testing of the level of knowledge according to the described algorithm, which provides a solution to the above problems.
2
Materials and Methods
The information model of the semantic structure of the educational course C [3], developed by the authors, is a formal representation of the educational course. It covers the complete semantic structure of the educational material I and the set of test tasks T, contains the relations R of these components and their parameters. The semantic structure of the course C is presented as follows: C = I ∪ T = H ∪ S ∪ K ∪ Q,
(1)
R = R1 ∪ R2 ∪ R3 ∪ R4 ∪ R5 ∪ R6 ,
(2)
where H is the set of headings (rubrics); S is the set of fragments (e.g. sentences) of educational material; K is the set of key terms; R1 is the set of relations between headings; R2 is the set of relations between headings and fragments; R3 is the set of occurrences key terms (relations between key terms and fragments); R4 is the set of relations between headings and key terms; R5 is the set of relations between test tasks and fragments; R6 is the set of relations between test tasks and key terms. Defining of all the elements of this model opens the possibility of adaptive semantic testing of the level of knowledge of educational materials according to the described method. Developed by the authors, the method of automated formation of the semantic structure of educational materials [4] as a result of parsing of the content of the corresponding digital files, determines the elements of the sets of headers H, fragments S and the corresponding relations between them: H = (ID, N ame, Grade), (3) where the attribute ID is a unique identifier of the heading; N ame is the name of the header; Grade is the heading level in the hierarchical structure. S = (ID, Content, N umber, T ype),
(4)
where the attribute ID is a unique identifier of the fragment; Content is the fragment content; N umber is the fragment number within the header; T ype is the fragment type. On the next step of the method, the content of each element of the set of headers H is processed to search for lower-level semantic units in the form of key terms. First, a set of all possible candidate-terms is formed, their occurrence in the content is searched, the numerical value of their importance is determined using the method of disperse estimation, their number in the resulting set is
Method for Adaptive Semantic Testing
495
limited using the method of limiting keyword density [15]. These operations result in elements of the set of key terms K and the corresponding relations: K = (N ame, N um),
(5)
where N ame is the symbolic name of the term; N um is the number of words in the term. The elements of the semantic structure of educational materials, found by means of the method, allow to carry out purposeful creation of test tasks for the further purposeful check of level of knowledge of concrete semantic units of educational materials, on concrete elements of semantic structure of educational materials. To do this, the method captures the relevant fragments of educational materials and the appearance of key terms in them. Developed by the authors method for automated test tasks creation for educational materials based on previously obtained data, allows to automatically create sets of test tasks that differ in parameters (type of question, number of correct answers, the rule for which the test task is formed, terms used in the task, etc.) and can be used for traditional and adaptive testing of the level of knowledge acquisition, including with the help of existing learning environments and testing systems [17]. The method creates new test tasks using production rules [18] and does not require additional formalization of educational materials. Each production rule consists of an antecedent and a consequence [2,12,14,20]. The antecedent is some fragmenttemplate, which is searched; and the consequence – the algorithm for converting a content fragment into the content of the components of the test task, which are performed under the condition of a successful search result (Fig. 1). To create a set of test tasks, each fragment s ∈ S from each heading h ∈ Hof the document i ∈ I is checked for the presence p ∈ P of each key term k ∈ K, compared to this heading h. If the term k is present in the fragment s, then the search of product rules for compliance with the antecedence of the rule. Each case of correspondence results in the automatic creation of a new test task q ∈ Q. The consequence determines the algorithm for converting the content of the fragment p into a test task q. Thus creating the elements of the set of test tasks Q and the corresponding relations: Q = (ID, T ype, T EContent, Answers),
(6)
where ID is the unique identifier of the test task; T ype is the type of question; T EContent is the content of the test task including answers; Answers is the number of answers. Respecting to the set of relations R between the elements of the considered sets, the general structure is: R = (T ypeRel, Obj1, Obj2, F eature),
(7)
where T ypeRel is the integer indicating the type of relation; Obj1 is the first entity of the relation; Obj2 is the second entity of the relation; F eature is the attribute that indicates the relation property.
496
O. Mazurets et al.
Fig. 1. Relations between model parameters
Depending on the attributes of the Obj1 and Obj2 belonging to separate sets, the T ypeRel attribute accepts the values given in Table 1. As a result of using the method, for each occurrence of each key term, a number of different test tasks are created, which provides complete semantic and content coverage of educational material by a set of automatically created test tasks. In addition, the method provides storage of metadata for positioning each test task and its components in the semantic structure of the educational material, which is necessary for further adaptive testing of the knowledge level. The developed method for adaptive semantic testing of educational materials level of knowledge makes it possible to determine the assessment of the level of knowledge of educational materials by using indicators of semantic importance of key terms for adaptive selection of test tasks in testing (Fig. 2). The input data of the developed method are elements of information model of semantic structure of educational course C, in particular set of test tasks Q and metadata for adaptive semantic testing of knowledge level: set of headings H, set of fragments of educational material S, set of key terms K, set of relations between headings R1 , set of relations between headers and fragments R2 , set of occurrences of key terms R3 , set of relations between headers and key terms R4 , set of relations between test tasks and fragments R5 , set of relations between test tasks and key terms R6 . The method also requires parameters: the element of semantic structure selected for testing, the algorithm for starting testing and the algorithm for assessing the level of knowledge (Fig. 3).
Method for Adaptive Semantic Testing
497
Table 1. List of values of the TypeRel attribute for the elements of the set of relations R TypeRel value Affiliation Obj1 Affiliation Obj2 Feature value 1(R1 )
h∈H
h∈H
Non-available (Null)
2(R2 )
h∈H
s∈S
Position of the fragment within the heading
3(R3 )
k∈K
s∈S
Position of the key term within the fragment
4(R4 )
h∈H
k∈K
Numerical indicator of the importance of the key term
5(R5 )
q∈Q
s∈S
Place of content use (task or answer)
6(R6 )
q∈Q
k∈K
Place of term use (task or answer)
Fig. 2. Scheme of implementation of adaptive semantic testing of the level of knowledge
In Step 1, within the content of the selected element of the semantic structure for testing, the current subelement h ∈ H of the educational course C is selected for further testing. In Step 2, from the set of semantic units k ∈ K of the current subelement h ∈ H of the educational course, the current semantic unit k ∈ K is selected, the learning of which will be checked in this iteration. The selection
498
O. Mazurets et al.
of the current semantic unit k is carried out according to the specified test start algorithm. Next, in Step 3, is created a sample of all test tasks Q ⊂ Q, suitable for checking the level of knowledge of the current semantic unit k. In Step 4, irrelevant test tasks are removed from the obtained set Q . Namely, remain (or receive a positive rating) test tasks that accord the criteria: – test tasks do not contain less important terms; – the corresponding fragments s ∈ S of the educational material have not been used yet; – test tasks have not yet been used in this testing; – types of test tasks were the least used. If in obtained result in set of actual test tasks Q ⊂ Q there are several test tasks, then a random selection of a test task from Q will be applied. In Step 5, is directly the testing process: the test task is provided to the user and the received answer is recorded. Depending on the correctness of the answers to the test tasks, in Step 6 the following action is selected: – if the data for estimation the knowledge level of current sub-element h ∈ H of the course is still insufficient, then applies the selection of the semantic unit k ∈ K of lesser or greater importance and go to Step 3; – if the data to estimation the level of knowledge of current sub-element h ∈ H of the educational course is sufficient, but there are other untested subelements h ∈ H, then applies the selection of the next subelement of structure h ∈ H and go to Step 2; – if the data to estimation the level of knowledge of current subelement of the educational course is sufficient and all subelements h ∈ H are checked, then the test is completed and go to Step 7. The choice of further action depends both on the correctness of the answer to the last test task, and on the general dynamics of the testing process (Table 2). In addition to the usual approach A (rapid testing), when the level of knowledge of each semantic unit is checked only once, testing with confirmation B is possible. In the case of confirmation testing, each change in the level of knowledge of semantic units requires confirmation the specified number of times, and after this number of iterations, option A is selected. In Step 7, the calculation of the test result is performed according to the selected algorithm for estimation of the level of knowledge. The calculated estimation of the level of knowledge of the educational material is the output data of the method.
Method for Adaptive Semantic Testing
499
Fig. 3. Scheme of method for adaptive semantic testing of educational materials level of knowledge
500
O. Mazurets et al. Table 2. Choice of further action in adaptive semantic testing
Local test result
Additional condition
Action
Test task solved
Exist less important semantic units
Less important semantic unit is selected as the current
Test task solved
No exist less important semantic units
A. The level of knowledge is fixed (maximal) B. Performing a repeat (for the current semantic unit)
Test task not solved
Exist more important semantic units A more important semantic unit was not considered
The more important semantic unit is chosen as the current
Test task not solved
Exist more important semantic units A more important semantic unit has already been considered
A. The level of knowledge (current) is fixed B. Performing a repeat (the more important semantic unit is selected as the current)
Test task not solved
No exist more important semantic unit
A. The level of knowledge is fixed (absent) B. Performing a repeat (for the current semantic unit)
3
Experiment, Results and Discussions
For investigation the effectiveness of the developed method, special software was developed that provides the ability to conduct traditional and adaptive testing. The obtained results showed that on average adaptive testing provides faster passing of the test and the test requires fewer tasks. For example, for the median start algorithm and the average knowledge estimation algorithm, adaptive testing compared to traditional testing provided an average of 20.53% faster test. At the same time, to determine the level of knowledge, it was necessary to use an average of 19.33% fewer tasks. In particular, when using the same set of test tasks, the adaptive testing algorithm reduced the average time required to pass the test: for the assessment of “F/FX” by 47.92%, for the assessment of “D/E” by 42.99%, for the assessment of “B/C” by 16.71%, to assess “A” by 2.89% (Fig. 4). The average number of test tasks obtained using the traditional testing algorithm was 15.22 units, while using adaptive testing – 11.95 units (Fig. 5). The developed method for adaptive semantic testing of educational materials knowledge level provides realization of the basic properties of adaptive testing, in particular selection of test tasks at testing depending on result of the answer to previous test tasks, and support of various algorithms of start, dynamics and estimation of testing. It is assumed that increasing the depth of learning
Method for Adaptive Semantic Testing
501
of educational material semantic content has the effect of learning less semantically important units of educational materials. Accordingly, the method allows calculating the estimation of knowledge level of educational materials by using indicators of semantic importance of key terms for adaptive selection of test tasks in testing. Semantic terms are related to the semantic structure of educational material in the form of rubricational system. Test tasks are related to fragments of educational material content, the knowledge level of which they test. In the process of testing, the level of knowledge of each of semantic structure elements of the educational material is consistently adaptively determined, and the final grade is calculated based on these results.
Fig. 4. The average time of tests, min
Under the described conditions, the algorithms of the test start determine from which level of semantically important units of educational materials the check of the knowledge level of each of the elements of the semantic structure of the educational material begins. In particular, it is possible to use the following algorithms to start testing: – regressive (testing begins with the semantically most important units); – progressive (testing begins with the semantically least important units); – median (testing begins with semantic units of medium importance). Algorithms of testing dynamics determine the need to repeat each check of the level of knowledge of each semantically important unit, which greatly affects the speed of testing. In particular, it is possible to use the following algorithms for testing dynamics:
502
O. Mazurets et al.
Fig. 5. The average number of received test tasks in test, units
– rapid testing (the level of knowledge of each semantic unit is checked only once); – confirmation testing (each change in the level of knowledge of semantic units requires confirmation the specified number of times). Algorithms for estimation of the knowledge level provide different approachs to the formation of conclusions about the test. Their choice depends on the characteristics of the test and the conditions of its applying. In particular, it is possible to use the following algorithms for estimation of the knowledge level: – average (final estimation is calculated as the average for estimations of all elements of the semantic structure of the educational material); – absolute limited (to pass the test it is necessary to show the level of knowledge of each of the elements of the semantic structure of the educational material is not lower than specified). Most of the described algorithms for starting, dynamics and estimation of testing are possible only for adaptive testing, which makes it impossible to compare the results with traditional testing. According to the method for adaptive semantic testing of educational materials level of knowledge, as a result of adaptive selection of test tasks, their maximum possible diversity in the final set is reached, because following criteria are considered: test task has not been used yet, relevant fragment of educational content has not been checked, test tasks of corresponding type were used the least, the test task does not contain less important semantic units. According to the results of practice implementation, on average, to estimation of the level of knowledge in adaptive testing required
Method for Adaptive Semantic Testing
503
the use of 17.28% fewer test tasks. The higher the level of knowledge of the user, the more test tasks were needed to determine the level of knowledge in adaptive testing. Moreover, if for the assessment of “F/FX” this number was significantly less than the traditional algorithm (57.61%), then for the assessment of “A” the number of obtained test tasks was even slightly higher (−6.97%), which was due to the need to deepen the test in each element of the structure of the educational material. Regarding testing time, on average the developed method provided 20.53% faster test than traditional testing. The higher the level of knowledge of the user, the more tasks were needed to determine the level of knowledge in adaptive testing, and accordingly the more time was spent on their solution – if the assessment is “F/FX” 47.92%, then for the assessment of “A” 2.89%. While in traditional testing, these figures differ insignificantly. On average, compared to traditional testing, a larger number of test tasks was required to assess “A”, but their processing took less time. This can be explained by the fact that during adaptive testing test tasks were presented in a logically consistent order, which allowed users to process them more focused and reduced the time to “switch” between different semantic blocks. With a correctly formed set of test tasks, the developed method allows to more accurately determine the uniformity of the level of knowledge and identify gaps in the understanding of the studied material. Although key terms are considered in the paper under semantically important units of educational materials, processing of special objects (formula, figure, scheme, etc.) in the content of educational materials is also quite possible, by considering their position in the text as separate semantically important units, along with terms. Their signatures (for example, geometric and disk dimensions of formulas) are used to determine the positions of identical special objects. Like special objects, in addition to inserted objects, also consider semantically significant classes (for example: surnames, years, centuries), to build tests to verify them, it is possible to include them in set of semantically important units. In the case when, on the contrary, it is necessary to exclude from the process the content of certain special elements (tables, program codes, etc.), it is possible to do so by using design styles other than heading styles and main text, which is natural when working with lowstructured text documents. The above determines the directions of further research.
4
Conclusions
The results of research allow to conclude, that the developed method provides a full-fledged tool for adaptive semantic testing of educational materials level of knowledge, which provides a complete semantic and structural coverage of educational material in testing. In combination with the described previous auxiliary methods (method for automated formation of semantic structure of educational materials, method for automated test tasks creation for educational materials), all sections of automated testing of knowledge level from loading of the document of educational material till the calculation of an estimation of level of its studying
504
O. Mazurets et al.
are provided. Applied investigations of the effectiveness of the developed method in comparison with the traditional algorithm for selecting test tasks established, that testing speed increased an average of 20.53% faster test, and to determine the level of knowledge required the use of an average of 19.33% fewer test tasks. The developed method of adaptive semantic testing of knowledge level of educational materials makes possible to use different algorithms for starting testing (regressive, progressive, medianic, etc.) and different algorithms for knowledge level estimation (average, absolute limit, etc.). As a result of adaptive selection of test tasks, their maximum possible diversity in the final set is reached, because following criteria are considered: test task has not been used yet, relevant fragment of educational content has not been checked, test tasks of corresponding type were used the least, and the test task does not contain less important semantic units. Especially effective is the use of adaptive testing according to the developed method for testing the level of knowledge of educational material, which contains mainly textual content. In this case, each test task allows to purposefully check the level of knowledge of separate semantic units of educational materials, which related to the semantic structure of educational material in the form of rubricational system. Test tasks are related to fragments of educational material content, the knowledge level of which they test. In the process of testing, the level of knowledge of each of semantic structure elements of the educational material is consistently adaptively determined, and the final grade is calculated based on these results. Thus, the paper considers a complex approach to the automation of adaptive testing of the knowledge level, which includes steps of automated formation of semantic structure of educational materials, automated test tasks creation and adaptive semantic testing of educational materials knowledge level.
References 1. Aggarwal, C.C., Zhai, C.: Text Data. Springer (2012) 2. Baki, I., Sahraoui, H.: Multi-step learning and adaptive search for learning complex model transformations for examples. ACM Trans. Softw. Eng. Methodol. 25(3), 1–37 (2016). https://doi.org/10.1145/2904904 3. Barmak, O., Krak, I., Mazurets, O., Pavlov, S., Smolarz, A., Wojcik, W.: Research of efficiency of information technology for creation of semantic structure of educational materials. In: Lytvynenko, V., Babichev, S., W´ ojcik, W., Vynokurova, O., Vyshemyrskaya, S., Radetskaya, S. (eds.) ISDMCI 2019. AISC, vol. 1020, pp. 554–569. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-26474-1 38 4. Barmak, O., et al.: Information technology for creation of semantic structure of educational materials. Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments Wilga, Poland (11176) (2019). https:// doi.org/10.1117/12.2537064 5. Brusilovsky, P., Rollinger, C., Peyl, C.: Adaptive and intelligent technologies for web-based education. Special issue on intelligent systems and teleteaching. Konstliche Intelligenz 4, 19–25 (1999) 6. Carpenter, S.K., Pashler, H., Wixted, J.T., Vul, E.: The effects of tests on learning and forgetting. Memory Cogn. 36, 438–448 (2008). https://doi.org/10.3758/MC. 36.2.438
Method for Adaptive Semantic Testing
505
7. Chen, J., Dosyn, D., Lytvyn, V., Sachenko, A.: Smart data integration by goal driven ontology learning. Adv. Big Data 529, 283–289 (2016). https://doi.org/10. 1007/978-3-319-47898-2 29 8. Cho, K.W., Neely, L.H., Crocco, S., Virando, D.: Testing enhances both encoding and retrieval for both tested and untested items. Q. J. Exp. Psychol. 70(7), 1211– 1235 (2017). https://doi.org/10.1080/17470218.2016.1175485 9. Durlach, P.J., Lesgold, A.M.: Adaptive Technologies for Training and Education. Cambridge University Press, New York (2012) 10. Gierl, M.J., Lai, H., Hogan, L.B., Matovinovic, D.: A method for generation educational test items that are aligned to the common core state standards. J. Appl. Test. Technol. 16(1), 1–18 (2015) 11. Gutl, C., Lankmayr, K., Weinhofer, J., Hofler, M.: Enhanced approach of automatic creation of test items to foster modern learning setting. Electron. J. e-Learn. 9, 23–38 (2011) 12. Hu, X., Pedrycz, W., Castillo, O., Melin, P.: Fuzzy rule-based models with interactive rules and their granular generalization. Fuzzy Sets Syst. 307, 1–28 (2017). https://doi.org/10.1016/j.fss.2016.03.005 13. Istiyono, E., Dwandaru, W.S.B., Setiawan, R., Megawati, I.: Developing of computerized adaptive testing to measure physics higher order thinking skills of senior high school students and its feasibility of use. Eur. J. Educ. Res. 9(1), 91–101 (2020). https://doi.org/10.12973/eu-jer.9.1.91 14. Kehrer, T., Alshanqiti, A., Heckel, R.: Automatic inference of rule-based specifications of complex in-place model transformations. In: Guerra, E., van den Brand, M. (eds.) ICMT 2017. LNCS, vol. 10374, pp. 92–107. Springer, Cham (2017). https:// doi.org/10.1007/978-3-319-61473-1 7 15. Krak, Y., Barmak, O., Mazurets, O.: The practice investigation of the information technology efficiency for automated definition of terms in the semantic content of educational materials, vol. 1631, pp. 237–245. CEUR Workshop Proceedings (2016) 16. Krak, Y., Barmak, O., Mazurets, O.: The practice implementation of the information technology for automated definition of semantic terms sets in the content of educational materials, vol. 2139, pp. 245–254. CEUR Workshop Proceedings (2018) 17. Kryvonos, I.G., Krak, I.V., Barmak, O.V., Bagriy, R.O.: New tools of alternative communication for persons with verbal communication disorders. Cybern. Syst. Anal. 52(5), 665–673 (2016). https://doi.org/10.1007/s10559-016-9869-3 18. Liu, H., Gegov, A., Cocea, M.: Rule-based systems: a granular computing perspective. Granular Comput. 1(4), 259–274 (2016). https://doi.org/10.1007/s41066-0160021-6 19. Manziuk, E.A., Barmak, O.V., Krak, I.V., Kasianiuk, V.S.: Definition of information core for documents classification. J. Autom. Inf. Sci. 50(4), 25–34 (2018). https://doi.org/10.1615/JAutomatInfScien.v50.i4.30 20. Moghimi, M., Varjani, A.: New rule-based phishing detection method. Expert Syst. Appl. 53, 231–242 (2016). https://doi.org/10.1016/j.eswa.2016.01.028 21. Pasichnyk, R., Melnyk, A., Pasichnyk, N., Turchenko, I.: Method of adaptive control structure learning based on model of test’s complexity. In: Proceedings of the 6th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, IDAACS 2011, vol. 2, pp. 692–695. Prague, Czech Republic (2011). https://doi.org/10.1109/IDAACS.2011. 6072858 22. Saed, M.R.: Methods and Applications for Advancing Distance Education Technologies: International Issue and Solutions. IGI Global (2009)
506
O. Mazurets et al.
23. Sorrel, M.A., Abad, F., N´ ajera, P.: Improving accuracy and usage by correctly selecting: the effects of model selection in cognitive diagnosis computerized adaptive testing. Appl. Psychol. Measure. 45(2), 112–129 (2021). https://doi.org/10. 1177/0146621620977682 24. Yang, C., Potts, R., Shanks, D.R.: Enhancing learning and retrieval of new information: a review of the forward testing effect. Sci. Learn. 3, 8 (2018). https://doi. org/10.1038/s41539-018-0024-y
Baseline Wander Correction of the Electrocardiogram Signals for Effective Preprocessing Anatolii Pashko1 , Iurii Krak1,2(B) , Oleg Stelia1 , and Waldemar Wojcik3 1
Taras Shevchenko National University of Kyiv, Kyiv, Ukraine [email protected] 2 Glushkov Cybernetics Institute, Kyiv, Ukraine 3 Lublin University of Technology, Lublin, Poland [email protected]
Abstract. Thus, to effectively solve the problem of eliminating the ECG baseline drift, it is necessary to filter the ECG signal. In this study for solve the problem of eliminating the ECG baseline drift, various filters were used: a low-pass filters based on the forward and inverse discrete Fourier transform, Butterworth filter, median filter, Savitsky-Golay filter. The results obtained confirmed the efficiency of filtering harmonic noise based on forward and backward DFT on real data. The method makes it possible to implement a narrow-band stop filter in the range from 0 to the Nyquist frequency. In this case, the notch band can be less than 0.1% of the Nyquist frequency, which is important when processing ECG signals. The filtering result was evaluated by the appearance of the filtered ECG. The evaluation criterion was the presence on the signal of characteristic fragments reflecting the work of the atria and ventricles of the heart in the form of a P wave, a QRS complex and a ST-T segment. A new method has been developed for filtering the ECG signal, which is based on the use of a sliding window containing 5 points. A linear function is constructed from a sample of five points using the least squares method. The value of the resulting linear function at the midpoint is used as the new value. Several iterations are performed to achieve a good result. To eliminate the baseline drift for the ECG fragment, it is proposed to use a special cubic interpolation spline. The algorithm for constructing the used spline requires solving a system of equations with a tridiagonal matrix. Keywords: ECG signal Interpolation spline
1
· Baseline correction · Filtering ·
Introduction
Modern diagnostics of heart diseases is mainly based on electrocardiographic research, which is an analysis of the curve of changes in the biopotentials of the c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 507–518, 2022. https://doi.org/10.1007/978-3-030-82014-5_34
508
A. Pashko et al.
heart. Electrocardiography is an electrical manifestation of the contractile activity of the heart. This activity is recorded using surface electrodes placed on the limbs or chest. Among the methods of examining the heart: electrocardiography, heart radiography and echocardiography, the method of standard electrocardiography has several advantages. It is available for examining the work of the heart in people of any age, it is absolutely safe, which allows the study to be repeated and to assess the dynamics of changes. Electrocardiogram (ECG) is objective, since the points of application of the electrodes are constant, while the correct medical interpretation of the results obtained makes it possible to accurately determine many pathological changes in the work of the heart. However, this method of studying cardiac activity using an ECG has significant limitations. One of the main problems of electrocardiography, from the side of the field of biomedical data processing, is the separation of the useful signal from the interference caused by external electromagnetic fields, random movements of the body and breathing. This is due to the active development and use of wireless sensors to obtain ECGs using electrodes attached to the user’s body. Note that such sensors are light weight and dimensions and contain three main parts: an analog-digital module, a digital control module, and a wireless signal transmission module (for example, [1]). The use of such portable sensors causes certain problems associated with the quality of the recorded signal, which are much less reflected in the classical ECG recording in stationary conditions. Since the use of wireless sensors for continuous data acquisition and ECG recording assumes that the user can be in an active state (for example, move, drive a car or other devices, be indoors or outdoors, etc.), such an ECG signal has a significant the noise level and not all conditions for the correct processing of such signals can be met. In particular, when recording an ECG using wireless sensors, the problem arises of finding a baseline (isoelectric line), which is the initial (starting) point of electrical activity of cardiac cycles and reflects states when the sensor did not detect electrical activity of the heart. Note that various digital filters are used to remove unwanted noise from the signal. However, it is difficult to apply filters with fixed coefficients to reduce interference in biomedical signals because human behavior cannot be accurately predicted. In this case, filtering leads to a deliberate change in the original signal in order to extract useful information from it. Different electrocardiographs use different types of filters that will affect the ECG signal in different ways. It is the different effect of filters on the ECG that explains the discrepancy in the readings of various electrocardiographs. Therefore, the development of methods for filtering electrocardiographic signals in order to obtain more correct diagnostic results is an actual problem. Problem Statements. The problem that is being solved in the work is the development of a method for eliminating the ECG baseline wander, based on the use of a sliding window and a specially constructed cubic interpolation spline. To clear the signal from unwanted noise when solving the problem of eliminating the ECG baseline wander, various most common filtering methods are used, namely, a low-pass filter, Butterworth filter, median filter, Savitsky-Golay filter, etc.
Baseline Wander Correction of the Electrocardiogram
2
509
Related Works
The electrocardiogram is widely used to study a person’s condition and diagnose heart disease, since good quality ECGs allow specialists to identify and correctly interpret the physiological and pathological phenomena of cardiac activity. However, in real situations of ECG recording, and especially when using wireless sensors, ECG signals are distorted by various kinds of noise, among them high-frequency noise, baseline deviation, which may be associated with breathing or movement of patients. These artifacts severely limit the usefulness of the recorded ECGs and therefore need to be removed for a better clinical assessment. It should be noted that the indicated effects on ECG signals are to a greater extent associated with the construction of systems for continuous information retrieval and its processing in real time. If you use the classical method of recording an ECG, then in order to minimize the noise of the ECG signal and baseline shifts, the patient, in particular, should not talk or laugh during the recording of the ECG; must be absolutely relaxed; should breathe normally during ECG registration; the recording should be paused until the patient prepares for recording the next segment, etc. The article [3] proposes a method for ECG improvement based on empirical frequency decomposition, which makes it possible to remove both high-frequency noise and the baseline with minimal signal distortion. The method is confirmed by experiments with databases [4]. Note that various modern methods of artificial intelligence, machine learning, deep learning for the study of the human cardiovascular system are presented in a review article [19]. Since cardiovascular diseases are one of the main causes of death around the world, the problem of early diagnosis for the detection of cardiac anomalies becomes extremely actual. To provide timely assistance, such diagnostics should be built on mobile wireless sensors with implementation on the basis of smartphones [8,20]. ECG-based systems can track, record and send signals for analysis, and in the event of critical deviations from the norm, they can contact emergency reception points. Note that when using wireless sensors to measure ECG in real time, the problem of baseline deviation arises, which can be removed using multivariate data analysis methods [15]. The complexities of baseline deviation are practically difficult to avoid due to the many associated parameters obtained during measurements. Each baselinedrift removal method has advantages and disadvantages based on the complexity of the technique and the accuracy of the filtered signal. Therefore, it is extremely important to assess the baseline deviation before it is removed from the ECG signal. A good estimate of baseline wander will prevent filtering of the ECG signal segments without a baseline, thereby ensuring the accuracy of the received signal [5]. The most commonly used method for removing baseline drift is a cubic spline algorithm that interpolates the baseline fit using selected data points on the original signal. Below are the main characteristics of filters that are effectively used to clean ECG signals from various kinds of noise baseline drift, power line noise, electromyographic (EMG) noise, noise of electrode movement artifacts, etc.
510
3
A. Pashko et al.
Materials and Methods
Frequency Filtering. Baseline drift is low frequency noise between 0.5 Hz and 0.6 Hz. To remove it, you can use a highpass filter with a cutoff frequency of 0.5 Hz to 0.6 Hz. Interference from the mains (50 Hz or 60 Hz noise from mains supply) can be removed using a notch filter with a cut off frequency 50 Hz or 60 Hz. EMG noise is high frequency noise 100 Hz and therefore can be removed by a low pass filter with an appropriate cutoff frequency. Electrode motion artifacts can be suppressed by minimizing object movement. The Fourier transform is used to implement this filter. Median Filter. Median filters are often used in practice as a means of preprocessing digital data [2,6,21]. A specific feature of the filters is their pronounced selectivity with respect to array elements, which are a nonmonotonic component of a sequence of numbers within the filter window (aperture), and stand out sharply against the background of neighboring samples. At the same time, the median filter does not affect the monotonic component of the sequence, leaving it unchanged. Thanks to this feature, median filters at an optimally selected aperture can, for example, maintain sharp edges of objects without distortion, effectively suppressing uncorrelated or weakly correlated noise and smallsized details. This property allows you to apply median filtering to eliminate anomalous values in data sets, reduce outliers and impulse noise. The median filter turns out to be especially effective when cleaning signals from impulse noise during image processing, acoustic signals, transmission of code signals, etc. Note that the median of a numerical sequence x1 , x2 , ..., xn for odd n is the average term of the series obtained by ordering this sequence in ascending (or descending) order. For even n numbers, the median is usually defined as the arithmetic mean of the two means of the ordered sequence. The median filter is a window filter that sequentially slides over the signal array and returns at each step one of the elements that fall into the filter window (aperture). The output signal yk of the sliding median filter with the width 2n + 1 for the current sample k is formed from the input time series ..., xk−1 , xk , xk+1 , ... in accordance with the formula: yk = med(xk−n , xk−n+1 , ..., xk−1 , xk , xk+1 , ..., xk+n−1 , xk+n ),
(1)
where med(x1 , ..., xm , ..., x2n+1 ) = xn+1 , xm , - elements of the variation series, ranked in ascending order of values xm : x1 = min(x1 , ..., xm , ..., x2n+1 ) ≤ x2 ≤ x3 ... ≤ x2n+1 = min(x1 , ..., xm , ..., x2n+1 ).
Butterworth Filters. The synthesis procedure for this filter includes two main stages [9,14]. The first stage is approximation - the procedure for obtaining a transfer function that reproduces the given frequency or time characteristics with a given accuracy. The transfer function of the nth order Butterworth lowpass filter is defined as 1 . (2) H(jw)2 = 1 + w2n
Baseline Wander Correction of the Electrocardiogram
511
Note that the amplitude frequency characteristic of the filter monotonically decreases with increasing frequency, and the greater the filter order, the more accurately the amplitude frequency characteristic of an ideal lowpass filter is approximated. The difference equation of the obtained filter has the form: yk =
k k bi ai xk−i − yk−i , a a i=0 0 i=1 0
k = 0, 1, 2, ....
(3)
There xk is the digital signal at the filter input, yk is the filtered digital signal at the filter output. Savitsky-Golay Filter. Given filter it was proposed in the paper [18] and developed in many studies of ECG (see, for example, [7,16]). To calculate this filter, transformations are used 1 2xk + xk−1 − xk−3 − 2xk−4 , (4) 10 T n−1 where T is the discretization period, zk = (zk )2 , yk = n1 i=0 zk−i , n = 10, .... zk =
4 4.1
Experiment, Results and Discussion Using Spline Functions for Elimination the Baseline Wander
A new method is proposed for removing baseline drift from the ECG signal based on authors results on ECG investigation [10–13,17]. The method is based on the use of a sliding window containing 5 points. A linear function is constructed from a sample of five points using the least squares method. The value of the resulting linear function at the midpoint is used as the new value. Several iterations are performed to achieve a good result. We use a cubic interpolation spline C 2 of smoothness to eliminate baseline drift for the ECG fragment shown in Fig. 1a) as a black line. The algorithm for constructing the used spline requires solving a system of equations with a tridiagonal matrix for the closure of which it is necessary to set additional conditions at the ends of the segment. The zero values of the first derivatives are used as such conditions. This choice is due to the fact that we apply the baseline drift elimination algorithm for the original discrete dataset that contains various noise and the use of any difference expression to record the first or second derivative at the ends of the segment is impractical. Figure 1b) shows an ECG fragment with the eliminated baseline drift along the spline and a linear trend. The original signal and the signal with the baseline drift eliminated are shown in Fig. 2.
512
A. Pashko et al.
Fig. 1. 1a) Original ECG fragment, linear trend (blue line), cubic spline (red line); 1b) Fragment of ECG with eliminated baseline drift using a spline, linear trend;
Fig. 2. Original signal and signal with eliminated drift (red line) Spline approximation of the ECG signal
It follows from the above results that the use of splines does not always give a positive result when eliminating baseline drift. The results of applying the baseline drift elimination algorithm based on a moving average window with an RMS straight line approximation are shows on the Fig. 3.
Baseline Wander Correction of the Electrocardiogram
513
Fig. 3. ECG signal with eliminated baseline drift and linear trend
4.2
Using Difference Filters for Elimination the Baseline Wander
In this section, we present the experimental results of eliminating drift using the various filters described above. Butterworth Filters. Taking into account the data of the cardiogram and the normalization of the transfer characteristic coefficients with respect to a0 , we obtain a0 = 1, a1 = −1.483, a2 = 0.93, a3 = −0.203, b0 = 0.031, b1 = 0.091, b2 = 0.091, b3 = 0.031. The difference equation can be rewritten as: yk =
3 i=0
bi xk−i −
3
ai yk−i ,
k = 0, 1, 2, ....
i=1
The results of filtration base on given filter are demonstreted on the Fig. 4. There original ECG signal is represent blue points and ECG signal after filtration is represent red points. Median Filter. The results of filtering by the median filter for m = 35 are shown in Fig. 5. Savitsky-Golay Filter. The results based on given filter are shown in Fig. 6. Lowrequency Filter. To remove the basiline drift using frequency filters, we use an approach based on the forward and inverse discrete Fourier transform (DFT). On the Fig. 7 a original (not cleaned) ECG signal is presented. Note that the ECG signal was recorded with a discreteness of 500 measurements per second. The frequency step is 0.1 Hz. Therefore, lowfrequency noise from 0.5 Hz to 0.6 Hz is the first six components of the spectrum. The signal spectrum is shown in Fig. 8.
514
A. Pashko et al.
Fig. 4. Butterworth’s filtration.
Fig. 5. Median filtration
Fig. 6. Savitsky-Golay filtration
Fig. 7. Original ECG signal for basiline drift elimitation using frequency filters
Baseline Wander Correction of the Electrocardiogram
515
Fig. 8. Original ECG signal spectrum
In this study, ECG filtering was carried out by a set of digital filters built on the basis of DFT, including: 1. A high-frequency filter with a cutoff frequency of 0.6 Hz. On the Fig. 9 shows the result of the filter
Fig. 9. Original (not cleaned) ECG signal - blue points and ECG signal after highfrequency filter processing - red points
2. Using a lowfrequency filter with a cutoff frequency 90 Hz. On the Fig. 11 shows the result of this filter. 3. Narrowband notch filter at 16.7 Hz and 50 Hz. In the Fig. 10 shows the result of the filter Note, that the useful signal is complex enough to reduce the problem of separating the useful signal from the noise. This situation is typical for the processing of electrocardiograms, and other signals of a complex structure, generated by living nature, and not by a technical system. In this case, it is necessary to introduce adequate assumptions about the interference model and build filters that ensure the maximum possible removal of interference with minimum distortion of the useful signal.
516
A. Pashko et al.
Fig. 10. Original (not cleaned) ECG signal - blue points and ECG signal after narrowband filter processing - red points
Fig. 11. Original (not cleaned) ECG signal - blue points and ECG signal after lowfrequency filter processing - red points
5
Conclusions
Thus, to effectively solve the problem of eliminating the ECG baseline drift, it is necessary to filter the ECG signal. For this, in this study, various filters were used: a lowpass filters based on the forward and inverse discrete Fourier transform, Butterworth filter, median filter, Savitsky-Golay filter. The results obtained confirmed the efficiency of filtering harmonic noise based on forward and backward DFT on real data. The method makes it possible to implement a narrowband stop filter in the range from 0 to the Nyquist frequency. In this case, the notch band can be less than 0.1% of the Nyquist frequency, which is important when processing ECG signals. The filtering result was evaluated by the appearance of the filtered ECG. The evaluation criterion was the presence on the signal of characteristic fragments reflecting the work of the atria and ventricles of the heart in the form of a P wave, a QRS complex and a ST– T segment. In further studies, special algorithms will be developed to assess the quality of ECG processing in a real system and to solve the problem of eliminating baseline drift. Also, a method has been developed for filtering the ECG signal, which is based on the use of a sliding window containing 5 points. A linear function is constructed from a sample of five points using the least squares method. The value of the resulting linear function at the midpoint is used as the new value. Several iterations are performed to achieve a good result. To eliminate the baseline drift for the ECG fragment, it is proposed to use a
Baseline Wander Correction of the Electrocardiogram
517
special cubic interpolation spline. The algorithm for constructing the used spline requires solving a system of equations with a tridiagonal matrix. As additional conditions at the ends of the segment, zero values of the first derivatives are used.
References 1. Texas Instruments. Low-Power, 1-Channel, 24-Bit Analog Front-End for Biopotential Measurements. http://www.ti.com/lit/ds/symlink/ads1291.pdf 2. Bae, T., Lee, S., Kwon, K.: An adaptive median filter based on sampling rate for r-peak detection and major-arrhythmia analysis. Sensors 20(6144) (2020). https:// doi.org/10.3390/s20216144 3. Blanco-Velasco, M., Weng, B., Barner, K.: ECG signal denoising and baseline wander correction based on the empirical mode decomposition. Comput. Biol. Med. 38, 1–13 (2008). https://doi.org/10.1016/j.compbiomed.2007.06.003 4. Goldberger, A., et al.: PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23), 215–220 (2000). https://doi.org/10.1161/01.CIR.101.23.e215 5. Haider, S.I., Alhussein, M.: Detection and classification of baseline-wander noise in ECG signals using discrete wavelet transform and decision tree classifier. Elektronika Ir Elektrotechnika 25(4), 47–57 (2019). https://doi.org/10.5755/j01.eie.25. 4.23970 6. Hao, W., Chen, Y., Xin, Y.: ECG baseline wander correction by mean-median filter and discrete wavelet transform. In: Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Boston, MA, USA, pp. 2712–2715 (2011). https://doi.org/10.1109/IEMBS.2011.6090744 7. Hargittai, S.: Savitzky-Golay least-squares polynomial filters in ECG signal processing. In: Computers in Cardiology Lyon, France, pp. 763–766 (2005). https:// doi.org/10.1109/CIC.2005.1588216 8. Holmes, C., Fedewa, M., Winchester, L., Macdonald, H., Wind, S., Esco, M.: Validity of smartphone heart rate variability pre-and post-resistance exercise. Sensors 20, 5738 (2020). https://doi.org/10.3390/s20205738 9. Jagtap, S., Uplane, M.: The impact of digital filtering to ECG analysis: butterworth filter application. In: International Conference on Communication, Information & Computing Technology (ICCICT), Mumbai, India, pp. 1–6 (2012). https://doi. org/10.1109/ICCICT.2012.6398145 10. Krak, I., Pashko, A., Stelia, O., Barmak, O., Pavlov, S.: Selection parameters in the ECG signals for analysis of QRS complexes. In: Proceedings of the 1st International Workshop on Intelligent Information Technologies & Systems of Information Security, Khmelnytskyi, Ukraine, pp. 1–13 (2020). http://ceur-ws.org/Vol-2623/ paper1.pdf 11. Krak, I., Stelia, O., Pashko, A., Efremov, M., Khorozov, O.: Electrocardiogram classification using wavelet transformations. In: IEEE 15th International Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer Engineering (TCSET), Lviv-Slavske, Ukraine, pp. 930–933 (2020). https://doi.org/ 10.1109/TCSET49122.2020.235573 12. Krak, I., Stelia, O., Pashko, A., Khorozov, O.: Physiological signals analysis, recognition and classification using machine learning algorithms. In: Proceedings of The Third International Workshop on Computer Modeling and Intelligent Systems (CMIS 2020), Zaporizhzhia, Ukraine, pp. 955–965 (2020)
518
A. Pashko et al.
13. Krak, I., Stelia, O., Potapenko, L.: Controlled spline of third degree: approximation properties and practical application. In: Lytvynenko, V., Babichev, S., W´ ojcik, W., Vynokurova, O., Vyshemyrskaya, S., Radetskaya, S. (eds.) Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2019. AISC, vol. 1020, pp. 215–224. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26474-1 16 14. Liu, M., Hao, H., Xiong, P., et al.: Constructing a guided filter by exploiting the Butterworth filter for ECG signal enhancement. J. Med. Biol. Eng. 38, 980–992 (2018). https://doi.org/10.1007/s40846-017-0350-1 15. Meyer, C., Keiser, H.: Electrocardiogram baseline noise estimation and removal using cubic splines and state-space computation techniques. Comput. Biomed. Res. 10(5), 459–470 (1977). https://doi.org/10.1016/0010-4809(77)90021-0 16. Nahiyan, K., Amin, A.: Removal of ECG baseline wander using Savitzky-Golay filter based method. Bangladesh J. Med. Phys. 8(1), 32–45 (2017). https://doi. org/10.3329/bjmp.v8i1.33932 17. Pashko, A., Krak, I., Stelia, O., Khorozov, O.: Isolation of informative features for the analysis of QRS complex in ECG signals. In: Babichev, S., Lytvynenko, V., W´ ojcik, W., Vyshemyrskaya, S. (eds.) Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2020. AISC, vol. 1246, pp. 409–422. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-54215-3 26 18. Savitzky, A.G.M.: Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36, 1627–1639 (1964) 19. Sevakula, R., Au-Yeung, W., Singh, J., Heist, E., Isselbacher, E., Armoundas, A.: State-of-the-art machine learning techniques aiming to improve patient outcomes per-taining to the cardiovascular system. J. Am. Heart Assoc. 9(4), e013924 (2020). https://doi.org/10.1161/JAHA.119.013924 20. Shabaan, M., Arshid, K., Yaqub, M., et al.: Survey: smartphone-based assessment of cardiovascular diseases using ECG and PPG analysis. BMC Med. Inform. Decis. Mak. 20(117) (2020). https://doi.org/10.1186/s12911-020-01199-7 21. Upganlawar, I., Chowhan, H.: Pre-processing of ECG signals using filters. Int. J. Comput. Trends Technol. (IJCTT) 11(4), 166–168 (2014). https://doi.org/10. 14445/22312803/IJCTT-V11P1355
Intellectual Information Technologies of the Resources Management in Conditions of Unstable External Environment Marharyta Sharko1(B) , Olga Gonchar2 , Mykola Tkach3 , Anatolii Polishchuk3 , Nataliia Vasylenko4 , Mikhailo Mosin5 , and Natalia Petrushenko6 1
State Higher Educational Institution “Pryazovskyi State Technical University”, Mariupol, Ukraine 2 Khmelnytsky National University, Khmelnytsky, Ukraine [email protected] 3 National Defence University of Ukraine named after Ivan Cherniachovskyi, Kyiv, Ukraine [email protected] 4 Kherson State Agrarian and Economic University, Kherson, Ukraine [email protected] 5 Ukrainian Armor LLC, Kyiv, Ukraine [email protected] 6 Kherson National Technical University, Kherson, Ukraine
Abstract. A methodology for preserving production under extreme environmental influences by controlling the dynamics of resource flows at certain iterations of the production development trajectory is proposed. Intelligent information technologies for the resource redistribution have been developed using the Ford-Fulkerson algorithm and its modifications related to the determination of residual resource flows. The saturation of the network in the modified Ford-Fulkerson algorithm is in the direction of the formation and increase of stabilization funds needed to compensate for the costs of negative effects of environmental factors. The specifics of managing the development of production in an unstable environment is not to stop production during the lockdown, but to redistribute the main flow of the resources and its division into production and stabilization components. Control iterations are synchronized with the direction of the shared flows at the time of restriction. A multifactor cross-algorithm of resource provision of enterprises operating in conditions of uncertainty is proposed. It consists of an algorithm for determining the share of resources to be redistributed and an algorithm for controlling their movement. In the conditions of business uncertainty at outbreaks of negative factors of influences of environment there is a transformation of the purposes of manufacture: not reception of the maximum profit, and preservation of manufacture at extreme influences of external environment.
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 519–533, 2022. https://doi.org/10.1007/978-3-030-82014-5_35
520
M. Sharko et al. Keywords: Uncertainty · Production resources principle · Ford-Fulkerson algorithm
1
· Processes · Residual
Introduction
Preservation of production under the uncertainty of the influence of the external environment is one of the urgent tasks of economic development of society. Sudden manifestations of financial instability, outbreaks of COVID-19, inflation and restrictions bring chaos and confusion to the business environment. Stability and resilience are becoming categories that can only be used in the short term. Longterm forecasting in conditions of uncertainty becomes impossible in general. The trajectory of production in conditions of uncertainty is stepwise and is a set of iterations of redistribution of resource flows and services with their transition to new conditions caused by a sharp manifestation of environmental influences. Ignoring uncertainty and maintaining the previous economic conditions during outbreaks and anomalies of environmental impact cause discomfort, instability, lead to large economic losses and cessation of production activities in general. An analogous consideration of the process of production in conditions of uncertainty and unpredictability of environmental influences in some approximation can be the transport problem of maximizing the passage of cars with limited road capacity in certain areas of traffic, solved using the Ford-Fulkerson algorithm. However, the objective function of the task of financial flow management in conditions of uncertainty of the external environment in principle differs from the objective function of optimizing traffic flows in the road network by the need to use the residual principle of resource allocation in all iterations of the production process. The complexity of the redistribution of resource flows at certain stages of production development encourages the search for new means of modeling unstable situations with the synchronization of sudden changes in the influence of the external environment on the functioning of production and the corresponding reactions to these manifestations. The unsolved parts of the general problem of preserving production in the event of instability of environmental influences include ensuring the possibility of adjusting the trajectory of production development by redistributing resource flows in the face of sudden manifestations of anomalies of environmental influences. The aim of the work is to develop intelligent information technologies for production development management by redistributing resource flows at the time of introduction or removal of restrictions caused by sudden manifestations of the external environment.
2
Literature Review
The globalization of production processes with the transfer of individual stages and operations to other underdeveloped countries requires unconditional coordination of efforts to ensure the continuity and continuity of production, both by
Intellectual Information Technologies of the Resources Management
521
the organizers of production and by these countries. Often sudden changes in the social, political and economic life of these countries, often unpredictable, associated, for example, with natural disasters, floods, hurricanes, inflation, unemployment, disease outbreaks, lead to the failure of the well-functioning mechanism of industrial relations. The moments of the extreme situation emergence are characterized by uncertainty, when neither the consumer nor the manufacturer can a priori predict the consequences of decision-making. We have to act in situations of uncertainty and risk, when it is necessary to work with a limited amount of knowledge about the situation. In these conditions, the development of production management strategies in conditions of uncertainty, taking into account the consequences and risks based on intellectual information technologies, is of undoubted value. The need to create technologies for managing the development of production in conditions of extreme unpredictable environmental influences has stimulated the development of modern methods and tools to support management solutions in various areas of their application. According to the World Economic Forum, the technologies that shape the future global socio-economic environment are [1]: – – – – – – – – – – – –
artificial intelligence and robotics; distributed networks and remote communication points; virtual augmented realities; production of additives; blockchain and distributed accounting technologies; new advanced materials and nanomaterials; energy storage, storage and transmission; new computing technologies; biotechnology; geoengineering; neurotechnology; space technology.
[13] presents the identification and prediction of bifurcation moments using complex networks. In [10], the results of identification of complex interactions between biological objects and their modeling in the form of graphs are presented. An algorithm for identifying structural damage at varying temperatures is presented in [6]. The transport problem of maximum flow with bandwidth constraints, solved using the Ford-Fulkerson algorithm, is considered in [5]. In [17], the implementation of the Ford-Fulkerson algorithm for finding the maximum flow with the lowest number of iterations is presented. In [9], the productivity of the maximum flow, determined using the Ford-Fulkerson algorithm on grid and random geometric graphs, was studied. In [7], combinatorial structures for weighted graphs with a low field volume were studied by the same method. Marketing of information products is manifested in the possibility of studying the market environment using information flows [2,12,20] and methods of its processing [3,4,9,15].
522
M. Sharko et al.
To exclude subjectivism in the assessment of extreme situations and the adoption of appropriate management decisions, it is useful to learn from related industries of management information in conditions of uncertainty [11,22,23]. Methodological support for the management of flows in a pandemic is presented in [16,18,19,21]. As the analysis of the cited literature shows, the interest in resource management in conditions of uncertainty is constantly growing.
3
Materials and Methods
In the task of preserving production under extreme external influences of the external environment, the components and constituents of the use of resource provision are variable, as well as the share of possible movements of resources for the formation of stabilization funds in order to save the enterprise on iterations of changes in production. Resource management under changing external conditions with the introduction of sharp unpredictable constraints is solved using the Ford-Fulkerson algorithm. The network of possible movements of resources for the formation and use of stabilization funds is considered as a connected oriented graph G(V, E) with bandwidth c(u, v) and flow f (u, v) = 0 for edges from u and v. It is necessary to find the maximum flow of resources from source s to drain t. At each step of the algorithm, identical conditions apply to all flows: – f (u, v) ≤ c(u, v) i.e. the flow from u to v does not exceed the throughput. f(u, v) = f (v, u). – f (u, v) = 0 ←→ fin (u) = fout (u) for all nodes u except s and t i.e. the flow does not change as it passes through the node. – the flow in the network is equal to the sum of the flows of all arcs incident to the flow of the graph. The network is initialized as follows. The lowest value for the edges is determined in each network circuit. To identify the arcs in the network, the subsequent initializations of the arcs are divided by this number. The saturation of the network occurs in the directions of the arcs. The algorithm starts from zero flow and increases its value at each iteration. The flow rate is then increased iteratively by finding a path along which as much flow as possible can be sent. The proposed modernization of the Ford-Fulkerson algorithm is to divide the propagating flow into two components: the main f (u, v) which provides production and additional f (u, v), which is associated with protective measures against the influences of the external environment f (u, v) = f (u, v) + f (u, v)
(1)
f (u, v) ≤ c (u, v)
(2)
Intellectual Information Technologies of the Resources Management
f (u, v) ≤ c (u, v)
523
(3)
where c (u, v) and c (u, v) – throughput of both components. The residual throughput of financial resources after passing the first technological iteration is equal to c (u, v) = c(u, v) − c (u, v)
(4)
After the completion of the first iteration of the production process and the completion of the necessary planned operations related to the development of the system, the residual flow fresidual (u, v) decreases by the amount spent on these operations and is again divided into two. fresidual = f (u, v) + f (u, v)
(5)
In the general case, when dividing the main stream, the number should not necessarily be equal to two, the main thing is that the principle of redistributing resources along the branches of the oriented graph in different iterations of production activity is fulfilled. Funding for the main stream is determined from the profit, the amount of which is determined by the production program. The financing of the additional flow associated with the probabilities of a dynamic process of stoppage of production and forced downtime of equipment, for example, due to a lockdown, the duration of which is unknown, is an uncertain value. At the same time, the salaries of employees should remain at the same level. For this, in addition to insurance, stabilization funds must be created, which are in direct proportion to the duration of the lockdown. The saturation of the network in the modified Ford-Fulkerson algorithm occurs in the direction of increasing additional stabilization flows. A feature of resource management technologies in an unstable environment is not only the division of the flow into two branches, but also their subsequent unification after the end of the lockdown. The remaining flow after the merger of both branches will decrease by the amount of financial costs associated with the conservation of equipment for its placement in warehouses with the implementation of additional work to ensure temporarily unused equipment in a working condition associated with its adjustment, metrological support, verification, etc. The size of this part of the financial flow directly depends on the technologies used. Further, the production operates in the usual regular mode, but with the only difference that the financing of the production program will already take place on a residual principle.
4
Experiment, Results and Discussion
The methodological support for maintaining production in the face of global extreme risks is associated with the synchronization of the use of production with forced measures to stop production with the simultaneous provision of social
524
M. Sharko et al.
guarantees for personnel. The task of redistributing financial resources at different stages of development of production systems can in essence be attributed to modeling the distribution of limited resources by moving them from inactive zones to active ones. Mathematical models of the distribution of limited resources are used in the applied problem of finding the optimal values of output and are represented in the class of linear algebraic levels and linear programming [8]. The structure of a linear system consists of technological elements and resource vector constraints, which form a rectangular matrix A. Au ≤ C
(6)
here A matrix of technological elements of dimension n ∗ m. C = (C1 , C2 , ..., Cn )T vector of system resources. The unused part of the resources is taken into account through the displacement and increase in the gradient of the objective function and is implemented by their priority controlled movement. The set of such priority resources used for the production of products is defined as the maximization of the criterion Bu . Bu → max (7) where B = (b1 , b2 , ..., bm ) – objective function vector. By solving Eqs. (6) and (7), the optimal amount of released resources m is determined, which is recommended for redistribution to other links of the production process. A diagram of the movement of financial flows of production under extreme environmental influences is shown in Fig. 1.
Fig. 1. Scheme of the movement of the financial flows of production under extreme environmental influences
Intellectual Information Technologies of the Resources Management
525
where f (u, v), fresidual (u, v), fresidual (u, v) production program release, f (u, v), f (u, v) organization and saturation of stabilization funds f (u, v), f (u, v) – equipment conservation. The concretization of the use of the algorithm is associated with the construction of the network and the initialization of arcs in the oriented graph network. When constructing a network from source to sink, the placement of labels was used in order to determine the amount of flow removed from the initial one, by which these values can be changed. Let us denote the routes of financial flows in the oriented graph network with the corresponding symbols. Through v1 we denote a stable route for the passage of financial flows under conditions of a constant production situation, through v2 – salary and social guarantees of personnel, through v3 - an extreme situation of termination of the production program with a transition to another branch of production development, associated with the formation and stabilization funds to compensate for losses during the lockdown, through v4 – conservation and storage of a part of unused equipment due to the termination of production. The analysis of statistical reporting made it possible to determine the weights of the arcs of the resource distribution oriented graph (Table 1).
Table 1. Weights of the arcs of the network of the production conservation oriented graph under uncertainty Economic activity
Arc weight, % Graph vertices
Production activity in a stable mode functioning of production
40
v1
Staff salaries
30
v2
Creation of stabilization funds during the lockdown
20
v3
Preservation, warehousing and storage of equipment during lockdown
10
v4
The initialization of arcs in the oriented graph network is shown in Fig. 2. When constructing an algorithm for maintaining production during lockdown, the sequence of resource allocation iterations was taken into account. Earned wages prior to the lockdown were financed from production activities. To compensate for possible losses associated with the interruption of production due to the onset of a lockdown, it is necessary to provide for the formation of stabilization backgrounds. Their education and staffing should also be financed from production activities in a stable mode of production. After the lockdown occurs, wages should be financed from the stabilization funds. At the time of the lockdown, equipment conservation should be funded. Time parameters were taken into account taking into account the statistics of the processes. Analysis of the timing of the introduction and cancellation of the lockdown in Ukraine in 2020–2021 showed that the periods of stable operation of production
526
M. Sharko et al.
Fig. 2. Initialization of arcs in the enterprise resource management network under conditions of uncertainty
were 4–5 months, while the lockdown was 1–1.5 months. Then the production process was repeated. These periods are network nodes Fig. 3. The sudden introduction of restrictions caused by environmental pressure on production activities forces to switch to another branch of development, for which purpose additional vertical and inclined edges with their own probabilities of manifestation and carrying capacities are provided in the network between the main routes of resource flows from sources to drain. The sequence of transitions in the distribution of resources in the oriented graph network is coordinated by the direction of the separating flows during the onset of the lockdown. In order to maintain the required number of qualified personnel, wages must remain at the same level, despite the forced downtime. Loading personnel remotely and self-isolation is not always possible, therefore this type of financing for production activities is costly, although aimed at the prospect of maintaining production. Funding of wages of employees is possible both from the profit of the enterprise and from stabilization funds. It is indicated by arrows on the arcs of the network. Synthesis of modern ideas on the management of innovation in a tool that allows to consider the problem of management in uncertainty is a multi-criteria logic. When making management decisions, it is necessary to take into account the pressure of the external environment. You have to develop original solutions, connecting intuition, creating flexible technologies for interaction with the customer. Forecasting in these conditions consists in establishing the dependence of the current state of the production situation on the previous states, that is, there are elements of the principle of causality [8,14,19].
Intellectual Information Technologies of the Resources Management
527
Fig. 3. Algorithm for moving production resources when a lockdown occurs
Under uncertainty conditions, the implementation of each management strategy is associated with a variety of possible results and the magnitude of possible losses and risks. In causal forecasting models, called causal or dependent, the predicted value is a function of a large number of input variables. Each variable must have its own set of parameters, which determines the probabilistic relationship. Such models are used when the explanatory variables are known in advance and are easier to predict than the dependent variable. The causal forecasting process is as follows: y = f (x1 , x2 , ..., xn )
(8)
where xi - independent variables, y – response variable. The result of causal forecasting is determined in the form of verbal statements, the structural analysis of which is related to the separation of the integral characteristics of the development of economic systems y. The central concept of causal prediction is the concept of a system, by which is understood the set of M elements xi connected by the set of bonds Z, ordering elements into a structure possessing a set of properties V . C =M ×Z
(9)
Therefore, the entire prediction system can be defined as an ordered set of S. S =< (M × Z) × V >
(10)
We denote by Cij the contribution of the j th economic development indicator of the ith input variable to the value of the output predicted characteristic y at time t = 0. The information support of the contribution of this economic
528
M. Sharko et al.
development indicator to the input variable xi can be represented as a linear matrix: (11) Cij = Ci1 , Ci2 , ..., Cim The contribution to the total value of the predicted output characteristic y ˆ from the element i is determined by equality. A = Cij ∗ αij
T
(12)
where αij – the value of the direction of the movement of resources in the organization of innovation, T is the operator of transportation of the matrix. The structural diagram of the causal relationships of forecasting in the organization of innovative activities is shown in Fig. 4.
Fig. 4. Diagram of the organization of information support for forecasting the results of innovative activities
Included in Fig. 4 database, its volume, completeness, sufficiency and adequacy of information I can be estimated by the values of entropy H through the probability Pi m I H= =− Pi log2 Pi (13) n i=1 From this formula follows: I = −n
n
Pi log2 Pi
(14)
i=1
Structural blocks for determining entropy are shown in Fig. 5. The measure of removing uncertainty is the amount of missing information I = ΔH = H0 − H1 .
Intellectual Information Technologies of the Resources Management
529
Fig. 5. Organization of computing operations in innovation management
The difference between a priori H0 and a posteriori information H1 determines the required amount of information to remove the uncertainty. To implement causal forecasting, it is necessary to have access to the information contained in the elements of the structure of the input variables xi in order to track the dynamics of the process of innovative transformations. On the basis of the research carried out, a cross algorithm for resource provision of enterprises operating in conditions of uncertainty is proposed. It has the form of a multifactor model for managing production development in dynamic changes in the influence of the external environment Fig. 6. According to its main purpose, the algorithm has a cross structure and consists of an algorithm for determining the share of resources to be redistributed, and an algorithm for controlling their movement at the time of the emergence or removal of restrictions on production activities. The algorithm for determining resource movements takes into account all the necessary operations and structural blocks. The resource movement control algorithm is responsible for the reallocation of resource flows. In this case, a combination of algorithms and the construction of separate models on different input data occurs, after which the results of both models of algorithms are combined. The equipment must be preserved from the stabilization funds of the enterprise, which is also indicated by arrows on the edges of the oriented graph. The distribution of the movement of financial flows at the time of the lockdown, performed using the Ford-Fulkerson algorithm, is qualitative in nature and has an economic justification, however, the moments of stopping production in connection with the announcement of the lockdown and the definition of time intervals for the occurrence of events are not taken into account. The prospect for the further development of this problem is the use of the model of Markov processes, which is represented in the form of a graph, where the states are interconnected by transitions from the ith state to the jth state and each transition is characterized by its own probability.
530
M. Sharko et al.
Fig. 6. Multifactorial cross algorithm for resource provision of enterprises in conditions of uncertainty
Intellectual Information Technologies of the Resources Management
5
531
Conclusions
The specificity of intelligent information technologies for managing resource flows of production under extreme environmental influences is not the termination of production during the lockdown, as a result of severe restrictions and selfisolation of personnel, but in the redistribution of the main flow of resources, its division into production and stabilization components and the determination of the residual flow, the dimensions of which are determined based on the application of a modified Ford-Fulkerson algorithm. The economic effect of the introduction of intelligent information technologies for organizing the production activities of enterprises in the face of uncertainty lies in the creation and saturation of stabilization funds, the retention of working personnel, the satisfaction of their social needs and requirements, as well as the conservation of equipment to maintain the viability of production. The use of the Ford-Fulkerson algorithm requires not only the introduction of appropriate modifications in the technology of its implementation, but also the creation on its basis of a fundamentally new multifactorial cross algorithm. This algorithm should consist of an algorithm for determining the financial resources to be redistributed and an algorithm for managing their movement at the time of the emergence or removal of restrictions on production activities. Reducing the volume of manufactured products while providing a social guarantee for personnel will be a means of maintaining production under extreme environmental influences.
References 1. Readiness for the future of production report 2018 (2018). https://www3weforum. org/docs/FOP Readiness Report 2018.pdf 2. Alford, P., Jones, R.: The lone digital tourism entrepreneur: knowledge acquisition and collaborative transfer. Tourism Manag. 81, 104139 (2020) 3. Babichev, S., Durnyak, B., Zhydetskyy, V., Pikh, I., Senkivskyy, V.: Application of optics density-based clustering algorithm using inductive methods of complex system analysis. In: IEEE 2019 14th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2019 - Proceedings, pp. 169–172 (2019). https://doi.org/10.1109/STC-CSIT.2019.8929869 ˇ 4. Babichev, S., Skvor, J.: Technique of gene expression profiles extraction based on the complex use of clustering and classification methods. Diagnostics 10(8) (2020). https://doi.org/10.3390/diagnostics10080584. Article no. 584 5. Bartysh, M., Dudziany, I.: Operations Research. Part 2. Optimization algorithms on graphs: Textbook. Lviv: Ivan Franko Lviv National University Publishing Center (2007) 6. Ding, Z., Fu, K., Deng, W., Li, J., Zhongrong, L.: A modified artificial bee colony algorithm for structural damage identification under varying temperature based on a novel objective function. Appl. Math. Model. 88, 122–141 (2020) 7. Dinitz, M., Nazari, Y.: Massively parallel approximate distance sketches. In: 33rd International Symposium on Distributed Computing (DISC 2019), vol. 153, pp. 35:1–35:17 (2020). https://doi.org/10.4230/LIPIcs.OPODIS.2019.35. https:// drops.dagstuhl.de/opus/volltexte/2020/11821
532
M. Sharko et al.
8. Kudin, V., Yakovenko, O.: About the algorithm of redistribution of resources in a linear system. In: Intellectual Systems for Decision Making and Problems of Computational Intelligence: Conference Proceedings, pp. 88–90 (2016) 9. Laube, U., Nebel, M.: Maximum likelihood analysis of the Ford-Fulkerson method on special graphs. Algorithmica 74, 1224–1266 (2016). https://doi.org/10.1007/ s00453-015-9998-5 10. Maji, G., Mandal, S., Sen, S.: A systematic survey on influential spreaders identification in complex networks with a focus on k-shell based techniques. Expert Syst. Appl. 161, 113681 (2020) 11. Nosov, P., Ben, A., Mateichuk, V., Safonov, M.: Identification of “human error” negative manifestation in maritime transport. Radio Electron. Comput. Sci. Control 4(47) 12. Olijnyk, A., Feshanich, L., Olijnyk, Y.: The epidemiological modeling taking into account the Covid-19 dess emination features. Methods Devices Qual. Control 1(44), 138–143 (2020). https://doi.org/10.31471/1993-9981 13. Peng, X., Zhao, Y., Small, M.: Identification and prediction of bifurcation tipping points using complex networks based on quasi-isometric mapping. Physica A Stat. Mech. Appl. 560(C), 125108 (2020). https://doi.org/10.1016/j.physa.2020.125108 14. Ponomarenko, V., Gontareva, I.: The system of causal connections between entrepreneurial activity and economic development. Econ. Ann.-XXI 165(5–6), 4–7 (2017) 15. Qu, Q.K., Chen, F.J., Zhou, X.J.: Road traffic bottleneck analysis for expressway for safety under disaster events using blockchain machine learning. Saf. Sci. 118, 925–932 (2019) 16. Sharko, M., Gusarina, N., Petrushenko, N.: Information-entropy model of making management decisions in the economic development of the enterprises. In: Lytvynenko, V., Babichev, S., W´ ojcik, W., Vynokurova, O., Vyshemyrskaya, S., Radetskaya, S. (eds.) Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2019. AISC, vol. 1020, pp. 304–314. Springer, Cham (2020). https://doi. org/10.1007/978-3-030-26474-1 22 17. Sharko, M., Liubchuk, O., Fomishyna, V., Yarchenko, Y., et al.: Methodological support for the management of maintaining financial flows of external tourism in global risky conditions. In: Babichev, S., Peleshko, D., Vynokurova, O. (eds.) Data Stream Mining & Processing. DSMP 2020. CCIS, vol. 1158, pp. 188–201. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61656-4 12 18. Sharko, M., Lopushynskyi, I., Petrushenko, N., Zaitseva, O., Kliutsevskyi, V., Yarchenko, Y.: Management of tourists’ enterprises adaptation strategies for identifying and predicting multidimensional non-stationary data flows in the face of uncertainties. In: Babichev, S., Lytvynenko, V., W´ ojcik, W., Vyshemyrskaya, S. (eds.) Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2020. AISC, vol. 1246, pp. 135–150. Springer, Cham (2021). https://doi.org/10. 1007/978-3-030-54215-3 9 19. Sharko, M., Shpak, N., Gonchar, et al.: Methodological basis of causal forecasting of the economic systems development management processes under the uncertainty. In: Babichev, S., Lytvynenko, V., W´ ojcik, W., Vyshemyrskaya, S. (eds.) Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2020. AISC, vol. 1246, pp. 423–436. Springer, Cham (2021). https://doi.org/10.1007/978-3-03054215-3 27 20. Sharko, M., Doneva, N.: Methodical approaches to transformation of tourist attractiveness of regions into strategic management decisions. Actual Probl. Econ. 8(158), 224–229 (201)
Intellectual Information Technologies of the Resources Management
533
21. Sharko, M., Zaitseva, O., Gusarina, N.V.: Providing of innovative activity and economic development of enterprise in the conditions of external environment dynamic changes. Sci. Bull. Polissia 3(2), 57–60 (2017) 22. Zinchenko, S., Ben, A., Nosov, P., Popovych, I., Mateichuk, V., et al.: The vessel movement optimisation with excessive control. Bull. Univ. Karaganda Tech. Phys. 3(99) (2020). https://doi.org/10.31489/2020Ph3/86-96 23. Zinchenko, S., Nosov, P., Mateychuk, V., Mamenko, P.P., et al.: Automatic collision avoidance with multiple targets, including maneuvering ones. Radio Electron. Comput. Sci. Control 4, 211–222 (2020). https://doi.org/10.15588/16073274-2019-4-20
Information Technologies and Neural Network Means for Building the Complex Goal Program “Improving the Management of Intellectual Capital” Anzhelika Azarova(B) Vinnytsia National Technical University, Vinnytsia, Ukraine
Abstract. The article proposes a conceptual approach for finding partial coefficients of influence of alternatives in a complex goal program represented by a linear hierarchy of goals which is considered as a neural network. This becomes possible thanks to the fact that both in the decision tree and in this neural network there are no feedback and singlelevel connections, as well as threshold sub-goals. As an illustration of this approach, a simplified complex goal program of “Improving the management of intellectual capital of an enterprise” is proposed. To obtain the potential efficiencies of projects in such a CGP the author initially used the DSS “Solon-2”. At the same time, in order to fully automate the proposed process of developing the CGP and its optimization, the author has developed an individual software. The conceptual approach proposed by the author has significant advantages over existing alternative methods. It allows automatically obtaining of the optimal weights of PCI, easily aggregating dynamic changing expert data. The proposed CGP has a great practical interest, since it helps you to optimize the process of managing the intellectual capital of an enterprise, firm or organization. It allows significantly increasing of the profitability of enterprise by obtaining multiple effects such as increasing the productivity of personnel etc. Keywords: Decision support system - DSS · Alternative · Project · Goal · Sub-goal · Over-goal · Main goal · Complex goal program CGP · Partial coefficients of influence · Potential efficiency of project
1
Introduction and Literature Review
Modern decision support systems are able to solve quite complex problems of human activity management in various fields in particular they are used to make complex management decisions in technique [5], economics [4,6,7], politics, sociology, medicine etc. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 534–547, 2022. https://doi.org/10.1007/978-3-030-82014-5_36
Information Technologies and Neural Network Means
535
One of the dominant problems of this level according to the author of the article is the assessment of intellectual capital of the enterprise, firm or organization. The complexity and versatility of the tasks that need to be considered for effective management of intellectual capital, the need to take into account a large number of different impact parameters, a big set of expert information that needs to be processed, all these factors encourage to use the decision support system (DSS) which is a powerful computerized tool. DSS allows with using of a hierarchical approach and a method of goal evaluation of alternatives to solve such complex problem. It should belong to the DSS of the second class [23] which includes systems of individual use, the knowledge bases of which are formed directly by the user. They are intended for use by middle-ranking civil servants, as well as managers of small and medium-sized firms to support making decision in situations where a given set of variants need to select a subset of variants that are (according to expert evaluation) the best from them. Each expert evaluates only part of the variants for the full set of criteria. The ranking of criteria is done in DSS taking into account the total amount of funding and costs of each project. Thus the most productive application of DSS of this class according to the author of the article is the automation of intellectual capital management processes of the firm, enterprise by building an appropriate complex goal program (CGP). A complex goal program is a set of activities - projects that are united by a single global goal - the main goal and common limited resources for their implementation. There are a lot of automation techniques of intellectual capital management processes in any branch of human activity [1–3,8–11,15,17,18]. Almost in each of them is affirmed that information technology contributes significantly to the management of intellectual capital. However, there are no clear mechanisms to manage the process of improving the use of intellectual capital through information technology. That’s why the author proposes a new technique of improving the management of intellectual capital through the information technology by using a hierarchical approach for building the appropriate automated CGP. The dominant unsolved part of the problem of making decision using a hierarchical approach is the search for all alternatives – sub-goals, projects (offered in the CGP) - partial coefficients of influence (PCI) on the main goal (or their over-goals or sub-goals). Totsenko V. G. performs a thorough analysis and description of existing approaches to the evaluation of alternatives, i.e. the determining of their PCI [23]. Among the most important of them are: – statistical approach of determining the aggregate estimate of the alternative based on the presentation of estimates of the alternative given by various experts as the implementation of some random variable and on the application of methods of mathematical statistics [23]. The disadvantage of this approach is not proving validity of the application of the normal distribution law for the realization of a certain random variable. It’s possible only in case of using
536
A. Azarova
the same type of technical devices operating in the same conditions, rather than experts the difference in assessments of which is due to a number of psychophysiological factors such as level of knowledge, experience, intuition, health, mood etc. In addition the question arises as to what the threshold value of the degree of consistency of different expert assessments should be in order for the application of this method to be correct; – the method of direct evaluation. By this method for each alternative have to be determined a number which is an expert assessment of the degree of presence in the alternative of a certain property which is determined by the relevant criterion. It is clear that this approach is quite subjective, inaccurate and cumbersome in case of large hierarchical structures of the CGP which contain hundreds of alternatives; – group of methods of pairwise comparisons of alternatives [20,21] is widely used to determine the relative indicators of the alternatives significance under the condition of insignificant differences of the compared alternatives in relation to the chosen qualitative comparison criterion; – method of goal evaluation of alternatives [19,22]. It presupposes the use of different methods for the assessment of PCI of each individual alternative sub-goal. However the links between the projects and over-goals, their subgoals and partial coefficients of influence corresponding to them may be too many in case of the complexity of the problem or in case of the large number of levels of hierarchy. These factors significantly complicate the search process. Thus, to obtain estimates of the influence of projects on goals, their over-goals (or main goal), the author proposes to consider them in complex as a neural network [12–14] which is analogous to the decision tree due to the lack of feedback and one-level relationships, as well as threshold sub-goals (only the full implementation of such sub-goal affects the achievement of the main goal). Such a network is a linear hierarchy of the corresponding complex goal program. This allows you to automatically evaluate the optimal PCI of each alternative by neural network. Thus, the aim of this article is to develop a method for finding optimal PCI in CGP represented by linear hierarchies of goals through the using of modern information technologies including DSS “Solon-2” and neural network tools [23] that enable the process of optimizing such PCI according to the approach proposed by the author of the article below.
2
Materials and Methods
Formal Problem Statement. Let there is a complex problem which have to be solved, in particular, it can be the problem of improving the management of intellectual capital of the enterprise, firm or organization. Since it is complex and poorly structured the author thinks that it is natural to solve it using a decomposition approach, in particular, a complex task - main goal (g0 ) - divide into a sequence of more simple tasks - sub-goals and finally into projects (lowestlevel sub-goals) - simplest tasks.
Information Technologies and Neural Network Means
537
The author proposes to apply a hierarchical goal approach which is described in detail by V. G. Totsenko [23] to build the hierarchy of goals in such a CGP. It necessitates the application of expert knowledge which allows to achieve the main goal of g0 , in particular, to improve the management of intellectual capital of the enterprise, firm. Thus: 1. Define a simplified linear hierarchy which represents a complex goal program “Improving of intellectual capital management” which has at the first level 3 sub-goals: g21 - “Improving human capital management”, g22 - “Improving consumer capital management” and g23 - “Improving organizational capital management”. At the lower level such a linear hierarchy is described by S projects xs , s = 1, S, S = 4. CGP has an unknown required vector W0 of partial coefficients of influence of sub-goals and projects on their over-goals which are always non-negative (P CI ≥ 0) as shown in Fig. 1.
Fig. 1. The example of simplified hierarchy of CGP “Improving of intellectual capital management” with S, (S = 4) projects xs , and unknown CPI vector W0
2. To search for the unknown vector W0 we first set the vector V0 of the indicators of potential efficiency of projects and check the concordance of the obtained estimates using the variance concordance coefficient. After checking the expert assessments for concordance we will use them expected values. As a result we obtain the potential effectiveness of the projects is v10 = 0, 253; v20 = 0, 26; v30 = 0, 387; v40 = 0, 047.
(1)
3. For the hierarchy shown in Fig. 1 we generate an arbitrary (randomly obtained) PCI vector W1 as shown in Fig. 2 which satisfies the limitation (2) f
wl1C = 1,
C=1
where f - the number of sub-goals of the l-th goal.
(2)
538
A. Azarova
Fig. 2. Example of a linear hierarchy with S, (S = 4) projects xs , and arbitrary vector PCI W1
4. Calculate the vector V1 of indicators of projects potential effectiveness of the hierarchy with an arbitrary vector PCI W1 using the software DSS “Solon-2”: v11 v21 v31 v41
= d1(1,1,1,1) − d1(0,1,1,1) = d1(1,1,1,1) − d1(1,0,1,1) = d1(1,1,1,1) − d1(1,1,0,1) = d1(1,1,1,1) − d1(1,1,1,0)
= 1 − 0, 674 = 0, 326, = 1 − 0, 838 = 0, 162, = 1 − 0, 762 = 0, 238, = 1 − 0, 807 = 0, 193.
(3)
5. Empirically the author of the article proved that the search for unknown values of PCI W0 only under given limitation (1) and limitation (2) leads to a set of optimal solutions. This caused to the adding of another criterion in this case which allows you to choose the best among those obtained with using of limitation (1) and limitation (2). As such an additional criterion the author of the article proposes a number of additional limitations. As additional limitation we choose only those equations that describe the sum of the PCI links that connect the s-th project with the main goal g0 in such directions which overlap all the links between the layers at least once. In this case we will choose the limiting equations based on two criteria - completeness and minimality: on the one hand, they should be taken so that they overlap all the links presented in the hierarchy at least once, on the other hand, the number of reselections of the same links should be minimal. Otherwise their excessive volume will increase the number of calculations and significantly complicate the work of expert evaluation. In generally, there is an interest only in problems which solved with the help of hierarchies contained no more than 7 ± 2 layers, because in case of larger
Information Technologies and Neural Network Means
539
number of layers the complexity of the problem is extremely high for expert evaluation. It is confirmed by the theory set out in 1944 by G. Miller [16]. In the case of the number of layers greater than 9 it is advisable to aggregate the sub-goals and thus transform the hierarchy to this dimension. For our case we select the following layers which is shown in Fig. 2: projects x1 , x2 , x3 , x4 are located in the 0-th layer; sub-goals g11 , g12 , g13 will form a layer 1, etc. Layer numbering allows you to enter a single notation for each PCI in the n where n - the number of the previous layer, i - the entire PCI hierarchy – wij number of the sub-goal (project) in the previous layer, j - the number of the sub-goal in the current layer which is shown in Fig. 3.
Fig. 3. Notation for each PCI links which unite sub-goal of i-th and j-th layers of goal hierarchy
Thus, on the basis of expert assessments for the hierarchy shown in Fig. 2 it is possible to formulate such a system (4) of additional limitations on the search for the optimal PCI vector W0 : ⎧ 0 1 2 w + w11 + w11 = 1.3; ⎪ ⎪ ⎪ 11 0 1 2 ⎪ w + w + w ⎪ 22 21 = 0.8; ⎪ ⎪ 12 0 1 2 ⎪ + w + w w ⎪ 33 31 = 0.7; ⎪ ⎪ 13 0 1 2 ⎪ = 1.0; w21 + w12 + w21 ⎪ ⎪ ⎪ 0 1 2 ⎪ + w + w w ⎪ 21 11 = 1.4; ⎪ ⎨ 22 0 1 2 + w31 + w11 = 0.9; w23 (4) 0 1 2 + w + w w ⎪ 31 13 31 = 1.2; ⎪ ⎪ 0 1 2 ⎪ w32 + w22 + w21 = 0.8; ⎪ ⎪ ⎪ 0 1 2 ⎪ + w31 + w11 = 1.0; w33 ⎪ ⎪ ⎪ 0 1 2 ⎪ + w + w w ⎪ 41 13 31 = 0.8; ⎪ ⎪ 0 1 2 ⎪ + w + w w ⎪ 23 31 = 0.5; ⎪ ⎩ 42 0 1 2 = 0.5. w43 + w32 + w21 So, let’s make the final statement of the problem in such form. Given: 1. Linear hierarchy with S projects (see Fig. 1) represented by a neural network. 2. The vector V0 of potential efficiency indicators (1) given by experts which are considered as a training sample for the neural network which represents a hierarchy of goals. The training sample is a set of pairs of input indicators (mask [1, . . . , xs = 0, . . . 1] the presence of the influence of the s-th project on the achievement of the main goal g0 ) and the corresponding output indicators vs0 of potential project effectiveness.
540
A. Azarova
3. The system of additional limitations on PCI defined by an expert is presented in form of relations (4). 4. An arbitrary PCI vector W1 which satisfy limitation (2) is generated (see Fig. 2). It is necessary: to find the optimal vector of PCI W0 which provides the optimization function: m 0 2 vs − vs1 → min (5) s=1
at the points of the training sample (1) and satisfies the limitations (2) and (4). Thus, to find the optimal vector of PCI W0 which provided the optimization function (5) author of the article proposes the further Algorithm 1 and additional to them Algorithm 2.
3 3.1
Experiment Development of Algorithm 1 to Determine the Optimal PCI in the Linear Hierarchy of Goals Represented by Neural Network for Building of the CGP “Improving the Management of Intellectual Capital”
To build the CGP “Improving the management of intellectual capital” the basic task is to determine the partial coefficients of influence of sub-goals and projects for their sub-goals (or main goal). For this, the author of the article proposes the following Algorithm 1. Step 1.1. Let us represent a linear hierarchy by a neural network which is described by S projects xs , s = 1, S and an unknown vector W0 of the PCI (which have to be fined) satisfying the condition wi0C ≥ 0 for the each i-th goal and project. Step 1.2. Introduce the vector V0 obtained with the help of the expert knowledge verified for consistency (this procedure is described in the problem statement) of the relative indicators of potential efficiency vs0 for S projects, s = 1, S. Step 1.3. For the created linear hierarchy with S projects we will generate a vector W(1) of arbitrary weights of the PCI wi1C for the i-th goal with f subgoals (where f is the cardinality of the set of sub-goals - their number) (for projects also). In this case, the total weight of the partial influence coefficients f wi1C (for the i-th goal) is equal to 1: C=1 wi1C = 1. Step 1.4. For the vector W(1) of arbitrary weights of the PCI wi1C to calculate the vector V(1) of relative indicators of potential efficiency consisting of S relative indicators of potential efficiency vs1 for S projects, s = 1, S: vs1 = d1(1...1) − d1(1,1...s=0,...,1,1) ,
(6)
Information Technologies and Neural Network Means
541
were d1(1...1) – the degree of achievement of the main goal, provided that all S projects are completed; d1(1,1,...,s=0,...1,1) – degree of achievement of the main goal, provided that all S projects are completed, except for this project s. Step 1.5. To calculate the deviation (error) of the relative indicators of potential efficiency as S
1
1 vs − vs0 . (7) Δ= S s=1 Step 1.6. Set the permissible deviation equal to 0,05. 1.6.1. If Δ > 0, 05, it is necessary to recalculate the PCI wi1C for each i f 1 th goal that has f sub-goals w = 1 (also for projects), i.e. to i C=1 C train the neural network according to rule (8) using the optimal training parameter η (opt) obtained in Algorithm 2: wi1C (n + 1) = wi1C (n) + Δwi1C (n) = wi1C (n) + η (opt) wi1C (n) = wi1C (n) 1 + η (opt) ,
(8)
were wi1C (n + 1) – PCI value wi1C at the next (n + 1)-th iteration; wi1C (n) – PCI value wi1C at the current (n)-th iteration; η (opt) – optimal training parameter obtained in Algorithm 2. After changing the weights in accordance with (8), we return to step 1.4 to recalculate (based on the vector of weights obtained at this step) the relative indicators of potential efficiency for S projects, s = 1, S. This procedure is repeated until the condition Δ ≤ 0, 05 becomes true. 1.6.2. If Δ ≤ 0, 05 (9) then we fix the vector of weights (obtained with such Δ) as the vector (opt) W(opt) of ideal weights (PCI wiC ) of sub-goals and projects. 3.2
Development of Algorithm 2 to Determination of the Optimal Training Parameter η (opt)
Let consider Algorithm 2 to determination of the optimal training parameter η (opt) which used in (8). Step 2.1. We take the vector Wn of the PCI with the values wiC obtained at the (n)-th iteration of Algorithm 1. Step 2.2. Generate the value of the training parameter η as η (p+1) = η (p) · 0, 5, p = 1, P , that is P - is the cardinality of the training parameters η set, P = 20. Let η (1) = 1, then η (2) = 0, 5, η (3) = 0, 25, η (4) = 0, 125, η (4) = 0, 0625 . . . etc. (n)
542
A. Azarova (p)
Step 2.3. For each generated training parameter η (p) to calculate the PCI wiC f 1 for each i-th goal that has f sub-goals C=1 wiC = 1 (also for projects), i.e. to train the neural network according to rule: (p)
(p)
(p)
wiC (n + 1) = wiC (n) + ΔwiC (n) (p) (p) (p) = wiC (n) + η (p) wiC (n) = wiC (n) 1 + η (p) ,
(10)
(p)
were wiC (n+1) – PCI value obtained with η (p) at the next (n+1)-th iteration; (p)
wiC (n) – PCI value at the current (n)-th iteration;
Step 2.4. The vector Wn(p) of the values of the PCI wiC obtained with (10) f (p) must be checked for the fulfillment of the condition C=1 wiC (n + 1) = 1. 2.4.1. If this condition is fulfilled, then we go to the step 2.5. 2.4.2. If this condition is not fulfilled, then it is necessary to normalize the weights in this way (p)
(p)
(p)norm
wiC
(n + 1) = f
wiC (n + 1)
C=1
(p)
wiC (n + 1)
and go to step 2.5. Step 2.5. For the vector Wn(p) obtained for a given η (p) with the weights of PCI (p) wiC calculate the vector Vn(p) of relative indicators of potential efficiency n(p)
consisting of S relative indicators of potential efficiency of projects vs , s ∈ S: n(p) n(p) (11) vsn(p) = d(1...1) − d(1,1...s=0,...1,1) , where d0(1...1) – the degree of achievement of the main goal, provided that all n(p)
S projects are completed; d(1,1...s=0,...1,1) - the degree of achievement of the main goal, provided that all S projects are completed, except for this project s. Step 2.6. To calculate the deviation (error) of the relative indicators of potential efficiency with current η (p) as Δ(p) = n(p)
S
1
n(p)
vs − vs0 , S s=1
(12)
were vs - relative indicators of potential efficiency for S projects, s = (p) 1, S, obtained for a vector W(n)p with the weights of PCI wiC defined with training parameter η (p) ; vs0 - relative indicators of potential efficiency for S projects, s = 1, S obtained with the help of the expert knowledge verified for consistency. Step 2.7. From P = 20 generated η (p) is chosen as the optimal training parameter η (opt) such for which deviation (error) Δ(p) is minimal: Δ(opt) = min{Δ(p) }.
Information Technologies and Neural Network Means
543
The End of Algorithm 1 Thus, the author of the article proposes to use the optimal value η (opt) obtained with Algorithm 2 in Algorithm 1 which, in turn, determine the optimal PCI in a simplified linear hierarchy of goals representing the CGP “Improving intellectual capital management”. 3.3
Automatization of Experimental Process
To automate the experimental process author has developed the software for realization of CGP “Improving the management of intellectual capital of an enterprise” an excerpt from the listing of which is illustrated below. >Gradient( 1, 0 ); mysystem.NormalizeWeights(); ga->Gradient( 1, 0 ); mysystem.NormalizeWeights(); //ga->Gradient( 0 ); cond_vert += (calculated_rez-pvle)*(calculated_rez-pvle); } CLink **plinks; int con_num; double con_vle; double con_sum; for( gi=0; giGetConditionsNum(); gi++) { if( pcond->Get_Data( gi, &plinks, &con_num, &con_vle ) ) { *rez=-3; return 0.0; } con_sum=0.0; for( kol=0; kolGetWeight() ; } if( fabs(con_sum-con_vle) > (*radug_max) ) { *radug_max = fabs(con_sum-con_vle); } *radug_avg= *radug_avg + fabs(con_sum-con_vle); radug_num+=1.0; cond_horiz+=(con_sum-con_vle)*(con_sum-con_vle); } double nevyazka= nevyaz_koef * main->NevyazkaNormalizacii(); switch( frezh ) {
544
A. Azarova
case 0: cond_vert = cond_vert+cond_horiz + nevyazka; break; case 1: cond_vert = cond_vert ; break; case 2: default: cond_vert = cond_horiz ; break; } *radug_max = *radug_max; *radug_avg= (*radug_avg)/radug_num ; return cond_vert.
4
Results and Discussion
According to the above approach implemented using the software developed by the author, the following PCI for linear hierarchy of CGP “Improving the management of intellectual capital of an enterprise” were obtained such results: – for the goals with edges coming out of the 2nd layer: 2 ω11 (τ + 1) = 0, 031361; 2 ω21 (τ + 1) = 0, 427519; 2 ω31 (τ + 1) = 0, 548416; – for the goals with edges coming out of the 1st layer: 1 ω11 (τ + 1) = 0, 241097; 1 ω12 (τ + 1) = 0, 344435; 1 ω13 (τ + 1) = 0, 343842; 1 ω21 (τ + 1) = 0, 652065; 1 ω22 (τ + 1) = 0, 435953; 1 ω23 (τ + 1) = 0, 240224; 1 ω31 (τ + 1) = 0, 10605; 1 ω32 (τ + 1) = 0, 222735; 1 ω33 (τ + 1) = 0, 419943; – for the projects (with edges coming out of the 0th layer): 0 ω11 (τ + 1) = 0, 0745; ω1 20 (τ + 1) = 0, 385; 0 ω13 (τ + 1) = 0, 526; 0 ω21 (τ + 1) = 0, 284; 0 ω22 (τ + 1) = 0, 0853; 0 ω23 (τ + 1) = 0, 115; 0 ω31 (τ + 1) = 0, 399; 0 ω32 (τ + 1) = 0, 21; 0 ω33 (τ + 1) = 0, 101; 0 ω41 (τ + 1) = 0, 246; 0 ω42 (τ + 1) = 0, 323; 0 ω43 (τ + 1) = 0, 261.
Information Technologies and Neural Network Means
545
Thus, as a result of application of the approach offered by the author it was ) received the optimum (by criterion (9)) vector W(opt) of ideal weights (PCI wiopt C of sub-goals and projects in linear hierarchy of CGP “Improving the management of intellectual capital of an enterprise”.
5
Conclusions
The article proposes a conceptual approach for finding PCI of alternatives in a complex goal program represented by a linear hierarchy of goals which is considered as neural network. This becomes possible thanks to the fact that both in the decision tree and in this neural network there are no feedback and single-level connections, as well as threshold sub-goals. As an illustration of this approach, a simplified complex goal program of “Improving the management of intellectual capital of an enterprise” is proposed. To obtain the potential efficiencies of projects in such a CGP the author initially used the DSS “Solon-2”. At the same time, in order to fully automate the proposed process of developing the CGP and its optimization, the author has developed an individual software, an excerpt from the listing of the program code of which was illustrated above. The conceptual approach proposed by the author has significant advantages over existing alternative methods. It allows automatically obtaining of the optimal weights of PCI, easily aggregating dynamic changing expert data. The proposed CGP (under the condition of its more complete presentation) has a great practical interest, since it allows you to optimize the process of managing the intellectual capital of an enterprise, firm or organization. She allows significantly increasing of the profitability of enterprise by obtaining multiple effects such as increasing the productivity of personnel etc.
References 1. Al-Musali, M.A.K., Ismail, K.N.I.K.: Intellectual capital and its effect on financial performance of banks: evidence from Saudi Arabia. Procedia Soc. Behav. Sci. 164, 201–207 (2014). https://doi.org/10.1016/j.sbspro.2014.11.068 2. Anifowose, M., Abdul Rashid, H.M., Annuar, H.A.: Intellectual capital disclosure and corporate market value: does board diversity matter? J. Account. Emerg. Econ. 7(3), 369–398 (2017). https://doi.org/10.1108/jaee-06-2015-0048 3. Arifin, J., Suhadak Astuti, E.S., Arifin, Z.: The influence of corporate governance, intellectual capital on financial performance and firm value of bank sub-sector companies listed at Indonesia Stock Exchange in Period 2008–2012. Eur. J. Bus. Manag. 26, 159–169 (2014) 4. Azarova, A., Azarova, L., Nikiforova, L., Azarova, V., Teplova, O., Kryvinska, N.: Neural network technologies of investment risk estimation taking into account the legislative aspect. In: Proceedings of the 1st International Workshop on Computational & Information Technologies for Risk-Informed Systems (CITRisk 2020) co-located with XX International Scientific and Technical Conference on Information Technologies in Education and Management (ITEM 2020), vol. 2805, pp. 308–323 (2020)
546
A. Azarova
5. Azarova, A.O., et al.: Information technologies for assessing the quality of ITspecialties graduates’ training of university by means of fuzzy logic and neural networks. INTL J. Electron. Telecommun. 66(3), 411–416 (2020). https://doi.org/ 10.24425/ijet.2020.131893 6. Azarova, A.O., Bondarchuk,V.: Comprehensive target program to improvement of company’s innovation attractiveness. Econ. Stud. J. 23(4), 125–136 (2014). Ikonomicheski Izsledvania. https://doi.org/10.30525/2256-0742 7. Azarova, A.O., Tkachuk, L.M., Kaplun, I.S.: Complex target program as a tool to stimulate regional development. Public Adm. Issues 4, 87–112 (2019) 8. Barzaga Sabl´ on, O.S., V´elez Pincay, H.J.J., Nev´ arez Barber´ an, J.V.H., Arroyo Cobe˜ na, M.V.: Information management and decision making in educational organizations. Revista de Ciencias Sociales 25(2), 120–130 (2019). https://doi.org/10. 31876/rcs.v25i2.27341 9. Batista, R.M., ans Meli´ an, A., S´ anchez, A.J.: Un modelo para la medici´ on y gesti´ on del capital intelectual del sector tur´ıstico. Las Palmas de Gran Canaria, Espa˜ na (2020). https://doi.org/10.31876/rcs.v27i1.35305 10. Bri˜ nez Rinc´ on, M.E.: Information technology: potential tool for managing intellectual capital. Revista de Ciencias Sociales 27(1), 180–192 (2021). https://doi.org/ 10.31876/rcs.v27i1.35305 11. Bueno, E.: El capital social en el nuevo enfoque del capital intelectual de las organizacionales. Revista de Psicolog´ıa del Trabajo y de las Organizaciones 18(2–3), 157–176 (2019) 12. Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall Upper, Saddle River (2006). https://doi.org/10.1142/S0129065794000372 13. Haykin, S.: Neural Networks and Learning Machines. Prentice Hall, Hamilton, Ontario (2009) 14. Haykin, S.: Adaptive Filter Theory. Prentice Hall, Hamilton, Ontario, Canada (2013). https://doi.org/10.1002/0471461288 15. Higuerey, A., Armas, R., Pardo-Cueva, M.: Efficiency and intellectual capital in communication companies in Ecuador. RISTI - Revista Iberica de Sistemas e Tecnologias de Informacao 2020(26), 178–191 (2020). https://doi.org/10.1016/j. sbspro.2014.11.068 16. Miller, G.A.: The magical number seven, plus or minus two. Psychol. Rev. 63, 81–97 (1956). https://doi.org/10.1037/h0043158 17. Robiyanto, R., Putra, A.R., Lako, A.: The effect of corporate governance and intellectual capital toward financial performance and firm value of socially responsible firms. Contaduria y Administracion 66(1) (2021). https://doi.org/10.22201/fca. 24488410e.2021.2489 18. Rojas, M.I., Espejo, R.L.: The investment in scientific research as a measure of intellectual capital in higher education institutions. Informacion Tecnologica 31(1), 79–89 (2020) 19. Saaty, T.L.: How to make and justify a decision: the analytic hierarchy process (AHP) - part 1. Examples and applications. Syst. Res. Inf. Technol. 1, 95–109 (2002) 20. Saaty, T.L., Vargas, L.G.: Decision Making with the Analytic Network Process. Springer, Boston (2006). https://doi.org/10.1007/0-387-33987-6 1 21. Saaty, T.L., Vargas, L.G.: Models, Methods, Concepts & Applications of the Analytic Hierarchy Process. Springer, Boston (2012). https://doi.org/10.1007/978-14614-3597-6
Information Technologies and Neural Network Means
547
22. Sipahi, S., Timor, M.: The analytic hierarchy process and analytic network process: an overview of applications. Manage. Decis. 48(5), 775–808 (2010). https://doi. org/10.1108/00251741011043920 23. Totsenko, V.G.: Methods and Systems of Decision-Making Support. Algorithmic Aspect. Naukova Dumka, Kiev (2002)
Quantitative Assessment of Forest Disturbance with C-Band SAR Data for Decision Making Support in Forest Management Anna Kozlova(B) , Sergey Stankevich , Mykhailo Svideniuk , and Artem Andreiev State Institution “Scientific Centre for Aerospace Research of the Earth of the Institute of Geological Sciences of the National Academy of Sciences of Ukraine”, Kyiv, Ukraine {ak,st,a.a.andreev}@casre.kiev.ua
Abstract. Recently, forest disturbance regimes have intensified all over the world. The usage of Synthetic-aperture radar (SAR) data has proven as a comprehensive tool for mapping forest disturbance that has a natural or human-induced origin and a wide spatiotemporal range. It has become an important component in many applications related to decision-making for sustainable forest management. In this research, we developed a methodologic framework providing detection and quantitative assessment of forest disturbance using C-band SAR data for decision-making support in forest management. Herein, we propose a method to analyze LAI changes over a certain period that reliably indicates the transformations of a forest canopy. Then, asymmetrical formalization of expert knowledge using the AHP and Gompertz model allows not only forest disturbance detection that occurs but also its assessment in the context of the levels of impact. The framework was applied in assessing short-term forest disturbance in the Pushcha-Voditsky forest. The map of the spatial distribution of forest canopy change impact within the Pushcha-Vodytsia study area during 2017–2020 was obtained. The proposed framework contributes to the decision-making process in forest management and may provide a quick response to abrupt forest disturbances in those atmospheric conditions when optical remote sensing is helpless. Keywords: Forest disturbances · Leaf area index · C-band SAR data · Analytic hierarchy process · Decision making
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 548–562, 2022. https://doi.org/10.1007/978-3-030-82014-5_37
Quantitative Assessment of Forest Disturbance with C-band SAR Data
1
549
Introduction
In recent decades, forest disturbance regimes have intensified, causing global forest loss all over the world [2,26]. To meet the challenge there is a substantial need for current management systems to be adapted to the changing regimes [19,32] and for specific decision-making tools to be improved [22,27]. The detection, mapping, and assessment of forest disturbances become important components in many applications related to decision-making for sustainable forest management [16,21], which is aiming to provide ecosystem services to society and maintain the biological diversity of forests [32]. Clark [6] defined disturbance in terrestrial forest ecosystems as ‘a relatively discrete event causing a change in the physical structure of the environment’. Forest disturbance can be natural (wildfires, severe windstorms, flooding, insect outbreaks, and disease affections, landslides, and avalanches) or anthropogenic (land conversion, logging, and mining), has a wide temporal (from abrupt to chronic) and spatial range (large and small) [12]. However, forest disturbance is typically associated with a loss of above-ground biomass and related carbon storage or structure disruption [21]. The leaf area index (LAI) is an important feature of the canopy structure or also called a ‘key biophysical variable’ that influences and is influenced by biophysical processes of forest plant communities. LAI is often employed as a basic descriptor of one aspect of local vegetation structure as a basis for comparisons among systems [23]. The use of passive and active remote sensing data has been proven as a comprehensive tool for mapping disturbance, as surface scattering is directly related to several canopy properties that change with disturbance [12]. In the last decades, major advances in tracking changes in forest structure have been associated with the free archive of Landsat imagery [34]. Time series algorithms utilize the mass-processing of Landsat satellite imagery, capable of detecting different types of disturbance. Unlike them, the single-date comparison indicates only whether the change has occurred [15]. Forest disturbance and recovery mapping using optical remote sensing technologies employ the properties of canopy structure changes that can be captured using different spectral indices [21]. However, as a three-dimensional parameter of forests, LAI is regarded to be superior to vegetation indices, including NDVI [33]. Frolking observes [12] that active remote sensing instruments (lidar in the visible/NIR and SAR in the microwave) are generally more directly sensitive to forest canopy biomass (microwave) and canopy height and vertical biomass distribution (lidar and InSAR) than are passive solar reflectance instruments, though not without their own shortcomings (e.g., clouds/smoke interference for lidar, biomass saturation for radar) [1,7,20]. Owing to large spatial and temporal variability, LAI is also one of the most difficult to quantify properly. While the dependence of backscattering on the LAI of various crops is studied intensively, for estimating the LAI of forests, radar data application is still rare. Most studies are devoted to LAI retrieving from C-
550
A. Kozlova et al.
band SAR in boreal forests [29]. Multi-criteria decision-making methods can play a valuable role in cases when there are some alternatives or courses of action. The well-established analytic hierarchy process (AHP) by T. Saaty is widely used, in particular, when stakeholders are involved in the decision-making process and to evaluate management alternatives [27]. The multi-criteria technique analysis through AHP most used, an aspect that has been extensively studied in other works [22]. The aim of the study was to provide detection and quantitative assessment of forest disturbance using C-band SAR data for decision-making support in forest management.
2
Materials and Methods
2.1
Methodological Framework
A dataflow diagram of the proposed framework is presented in Fig. 1. The method for detecting and assessing forest disturbance using SAR data is a compound of the three sequenced sets of procedures for input data pre-processing, LAI calculation, and forest disturbance assessment. 2.2
Input Data
The forest disturbances detection and assessment based on radar remote sensing calls for the integration of the following data over the area of interest: – – – –
dual-polarized C-band SAR imagery (VV+VH); topography data for terrain slope and aspect estimation; the digital hemispherical photographs (DHP) of forest canopies; information about current forest condition and influencing factors, namely forest inventory, data on environmental harvesting, disease and fires, meteorological and geological emergencies (storms, windfalls, droughts, landslides).
SAR Imagery. The Sentinel-1 mission is a leading provider of open SAR data, which was launched by the European Space Agency. The constellation of two polar-orbiting satellites (Sentinel-1A and Sentinel-1B) operate day and night, sensing with a C-band (λ = 5.6 cm, f = 5.4 GHz) SAR instrument [11]. Making dual-polarized (VV+VH) acquisitions in the Interferometric Wide swath (IW) mode, Sentinel-1 provides the Level-1 Ground Range Detected High Resolution (GRDH) product at a resolution of 10 m with a swath of 250 km. The data are available on the Copernicus Scientific Hub web-service. However, other current SAR missions also provide C-band data, namely Radarsat Constellation launched by the Canadian Space Agency and Gaofen-3 by the China Academy of Space Technology. Topographic Data. The degree of a surface incline (slope) and the orientation of slopes (aspect) were computed based on ALOS Global Digital Surface Model “ALOS World 3D - 30 m” (AW3D30) to remove the effects of terrain [30]. The
Quantitative Assessment of Forest Disturbance with C-band SAR Data
551
Fig. 1. Dataflow diagram of the quantitative assessment of forest disturbance discovered via LAI changes from C-band SAR data
AW3D30 is a global digital surface model (DSM) at 1 arc-second (approximately 30 m) resolution that was released by the Japan Aerospace Exploration Agency (JAXA) [31]. The ALOS AW3D30 data (version 3.1; release date: April 2020) with spatial resolution 24.7 × 24.7 m are freely available on the Japan Aerospace Exploration Agency web service. Field Data. Plant canopy structure parameters are extracted based on the digital hemispherical photography (DHP) [9]. This passive sensing technology provides a large field of view true-color images taken skyward from the forest floor with a 1800 hemispherical (fisheye) lens. Therefore, the photographs offer the records of size, shape, and gaps location in forest overstory.
552
A. Kozlova et al.
Expert Knowledge. The data required for analyzing the current state of forest ecosystems and the effect of different disturbance agents can be obtained from open sources of the appropriate governmental departments along with global information services. For instance, information about insect outbreaks and disease affections, as well as about logging and its types, can be received from the regional departments of forestry and hunting. Actual and archival fire records can be gained both from global fire maps provided by NASA’s Fire Information for Resource Management System (FIRMS) and from regional fire monitoring centers, for example, Regional Eastern Europe Fire Monitoring Center (REEFMC). 2.3
Data Pre-processing
The Sentinel-1 GRDH data pre-processing workflow consists of standard correction steps available in SentiNel Application Platform (SNAP) provided by European Space Agency. The workflow starts with applying the precise orbit of acquisition (Apply Orbit File) to provide an accurate satellite position and velocity information. Then, the reduction of noise effects in the inter-sub-swath texture (Thermal Noise Removal ) allowed normalizing the backscatter signal within the entire Sentinel-1 scene and reduced discontinuities between sub-swaths for scenes in multi-swath acquisition modes. Next, the calibration of Sentinel-1 GRDH (Calibration) allowed to convert image intensity values (digital pixel values) to sigma nought values, called radiometrically calibrated SAR backscatter coefficient (σ 0 ). To remove granular noise and increase image quality, speckle filtering was applied using the Lee-Sigma filter with a 7 × 7 filter window size (Speckle Filter ) [17]. Finally, terrain correction was intended to remove distortions related to the sensor side-looking geometry and topography. Thus, the Range Doppler terrain correction operator implemented in SNAP based on the Range Doppler orthorectification method [28] was used for SAR scene geocoding from images in radar geometry (Terrain Correction) [8]. The topographic parameters (slope and aspect) were computed based on the ALOS AW3D30 data. For this purpose, slight distortions were removed from the raster data by sinks filling. The planar method was used to compute the slopes by calculating the value variation maximum rate from a cell to its immediate neighbors (Slope) [5]. The aspect represents the compass direction that the downhill slope faces for each location (Aspect). To retrieve the LAI from the DHPs by using GLA, image pixel positions were transformed into angular coordinates. Then, pixel intensities were divided into sky and non-sky classes by using an implemented optional threshold tool. For this purpose, the images’ contrasts were analyzed in the blue plane to estimate sky-brightness distributions. As a result, these data were combined to produce estimates of the LAI (LAImeasured ) [10]. 2.4
LAI Calculations
In this study, the LAI evaluation was performed by the computation of a LAI indicator (I LAI ). In the present case, we found that the product of the σ 0 regis-
Quantitative Assessment of Forest Disturbance with C-band SAR Data
553
tered in VH (σVH ) by σ 0 registered in VV (σVV ) was the most suitable (I LAI ) (LAI indicator ): = σVH σVV . ILAI
(1)
In order to enhance the statistical likelihood, the I LAI was adjusted by the integration of the topographic parameters computed and S-1 GRDH auxiliary data. In particular, the adjustment function negotiates the normal line of land surface local element and describes the mutual orientation of radar viewing direction: f(α, γ, θ, ψ) = cos γ sin θ cos(ψ − α) + sin γ cos θ,
(2)
where f(α, γ, θ, ψ) is a LAI adjusting function, α is a radar heading angle, γ is a radar incidence angle, θ is a terrain element slope, and ψ is a terrain element aspect. Hence, the final equation of the I LAI becomes: ILAI = ILAI · f (α, γ, θ, ψ).
(3)
Meanwhile, forested areas of interest were mapped based on the supervised classification. For this purpose, textural attributes were derived from the S-1 GRDH product by using the Gray-Level Co-Occurrence Matrix (GLCM) (Texture Analysis) [14]. Therefore, radar backscattering mean, contrast and variance were integrated with σ V H , σ V V for the SVM-based classification [13] (Supervised Classification). The resulted classification was used to generate the binary mask of vegetated and non-vegetated areas applied to the I LAI surface raster (Forest/Non-Forest Masking). The I LAI was applied for the computation of the C-SAR-based LAI (LAI SAR ) over the ground control samples collected during the field measurements campaign in 2017. For this study, we assumed that the model is appropriate if the variance between dependent and predictable variables R2 ≥ 0.8. The coefficient of determination (R2 ) was improved based on the robust statistic by using the iterative search algorithm. Thus, we excluded one sample from the dataset consisting of the 15 ground control LAI samples. Therefore, we solved the optimization problem in the form: R2 [σVH σVV f (α, γ, θ, ψ) max]. 2.5
(4)
Assessment of Forest Disturbances
Reflecting the transformations of a forest canopy, LAI changes reliably indicate its disturbances. Thus, the analysis of LAI change over a certain period provides a necessary background for the forest disturbance assessment (LAI Change Detection). However, the direct interrelation of the final and initial LAI values does not fully reflect the impact of the changes that have taken place. This is due to
554
A. Kozlova et al.
the significant non-linearity of the expected quantitative assessment response to the LAI ratio. There are several drivers of such significant non-linearity. The first one is the statistical unevenness of LAI value distributions within the study area as well as over time. The second driver is the inconcinnity of the obtained assessment, in particular, their asymmetry: for example, the LAI growth and reduction by the same value are assessed differently. Differences in biotope types, growing season, and anthropogenic impact contribute too. Thus, the overall task is to restore the nonlinear relationship between the output assessments and the LAI changes (Non-linear transformation). Since output assessment needs quite a lot of external (i.e., not contained in the input LAI distribution) knowledge, expert involvement is required. Human knowledge is ill-adapted to numerical handling, so preliminary formalization is desired. A suitable tool for expert knowledge formalization is the mathematical apparatus of the Analytic Hierarchy Process (AHP) by T. Saaty [24]. The pairwise comparison matrix helps to restore a nonlinear numerical relationship using expert knowledge. In this case, the entire range of values should be split into few intervals (Statistical treatment), and an expert should evaluate inter-intervals transition priorities on a ratio scale [4]. Values less than 1 correspond to the negative impact and indicate forest disturbances. Values greater than 1 correspond to the positive impact of the LAI changes occurring in a canopy and refer to natural forest development. The one value describes neutral change or no change. Therewith, unlike the classical AHP, the comparison matrix may not be symmetrical. The comparison matrix consistency is not constrained [3], as its purpose is to restore the relationship with the ratio of actual values in the intervals that are evaluated. If the comparison matrix E = {Eii } , Eij ≡ E1ji ∀i, j is provided by expert, then it can be put into correspondence with the matrix of interval values ratios LAIi . R = {rii } , rij ≡ LAI j The expert-based estimates are under essential sampling noise, which can be reduced by approximation with some smooth curve. The general nature of the E(r) relationship is a growth curve with saturation [25]. In our case, the Gompertz model as an asymmetrical sigmoid function was chosen for the E(r) relationship approximation: E(r, a, b, c) = a · e−b·e
−c·r
,
(5)
where a, b and c are model parameters, scilicet a is an asymptote, b is a bias, and c is a growth rate. Then, the least-squares fitting of Gompertz curve parameters should be performed [18], and the approximating functions should be restored. It becomes possible to calculate by the ratio of actual LAI values the expertdriven assessment of the occurred change. In this way, a map of the forest disturbance will be obtained.
Quantitative Assessment of Forest Disturbance with C-band SAR Data
3
555
Experiment
To verify the proposed method we carried out detection and quantitative assessment of forest disturbance that happened from 2017 to 2020 in the PushchaVoditsky forest located within the Obolonsky and Svyatoshinsky districts of Kyiv (Fig. 2a). The main component of the vegetation of the study area is a well-grown three to four-layer pine forest. It should be noted, that choosing the study area, we focused primarily on the areas accessible for organizing reliable LAI field measurements rather than detecting specific forest disturbance. Within the Pushcha-Vodytsia study area, we provided the LAI field measurements on 6 July 2017. The measurements were taken at the 15 sample plots under an overcast sky (Fig. 2b). The high contrast true-color images of the forest canopy of the different vertical structures were taken based on the DHP method (Fig. 2c). The LAI was calculated in the GLA environment using the analysis of a gap fraction geometry distribution [10]. Furthermore, since no significant abrupt disturbances (fires, windblows, landslides, or other large-scale human-induced disturbances) were observed in the selected area, we studied short-term changes caused mainly by climate change,
Fig. 2. Location of the study area within the city of Kyiv (a); Sentinel-1 colorcomposite image of the calibrated SAR backscattering coefficient σ 0 and the GLCM textual attribute (R: σ V V , G: σ V H , B: ContrastV V ) acquired on 12 July 2017, and the sample plots where field LAI measurements were held on 6 July 2017 (b); DHP-images representing forest canopies of different vertical structure (c)
556
A. Kozlova et al.
the impact of insects and diseases, which are typical for the forests of Ukraine [19]. Over the study area, we used two images of S-1A Level-1 Ground Range Detected High Resolution (GRDH), acquired on 12 July 2017 and 26 July 2020 in descending pass. The data was downloaded, processed, and used to derive the radar backscattering coefficient (σ 0 ) [11]. Once the required input data had been collected and pre-processed, we applied procedures proposed in the methodological framework to calculate LAI and to assess forest disturbance and recovery.
4
Results and Discussions
According to the I LAI values calculated for 12 July 2017 and LAI field measurements, the regression relationship was obtained and verified (Fig. 3). The coefficient of determination R2 was 0.87.
Fig. 3. The plot of regression relationship between LAI measured at the sample plots and LAI indicator values derived from the Sentinel-1 SAR image
LAI calculation using C-band SAR images derived for 12 July 2017 and 26 July 2020 resulted in two maps of LAI spatial distribution (Fig. 4).
Quantitative Assessment of Forest Disturbance with C-band SAR Data
557
Fig. 4. Spatial distribution of C-SAR-based LAI (LAISAR ) derived for 12 July 2017 (a) and 26 July 2020 (b) and subdivided into five intervals depending on the complexity of forest canopy structure
During the investigated period, as could be seen from Fig. 4, the area of Pushcha-Voditsky forest increased due to single-storey stands varying by canopy density, mainly due to overgrowing of cleared sites in the central and southwestern part of the forest. At the same time, complex stands have slightly decreased. It was also found that the rare fires that had occurred in the northwest of the forest during the study period, are not the main reason for the detected changes. Depending on the complexity of the forest canopy structure, the LAI values of each map were divided into five intervals, as is shown in Fig. 5, from single-layer open stands to stands with all four layers well developed.
Fig. 5. LAI values histogram within the study area (green) and approximating normal distribution (red): a – 2017, b – 2020
558
A. Kozlova et al.
Expert judgments we presented in the form of pairwise comparisons, which is standard for the AHP method. The matrix of comparative expert priorities is shown in Table 1. Table 1. Matrix of comparative expert priorities Intervals I
II
III
IV
V
I
1
3
5
5
5
II
1/3 1
3
5
5
III
1/5 1/3 1
3
5
IV
1/7 1/5 1/3 1
3
V
1/9 1/7 1/5 1/3 1
Figure 6 shows the correspondence between the comparison matrix provided by an expert and the matrix of interval LAI values ratios within the study area.
Fig. 6. The relationships between LAI values ratio and expert priorities for exact histogram intervals (a) and Gaussian approximation (b)
To detect and assess forest disturbance and recovery, the map of the spatial distribution of LAI changes in terms of their impact on forest canopy within the study area during 2017–2020 was created (Fig. 7). It represents the levels of LAI change impacts for each pixel.
Quantitative Assessment of Forest Disturbance with C-band SAR Data
559
Fig. 7. Spatial distribution of forest canopy changes impact within the PushchaVodytsia study area during 2017–2020. The map is presented in levels of change impacts. Negative ones reflect the rate of forest disturbance, and positive ones - vegetation recovery.
5
Conclusions
Over the last decades, the use of remote sensing techniques has allowed ensuring the detection, mapping, and assessment of forest disturbance that has a natural or human-induced origin, and wide temporal and spatial range. Being applied in assessing several forest canopy properties that change with disturbance, these techniques provide informational support for decision-making in sustainable forest management. In this research, a novel framework is put forward for the detection and quantitative assessment of forest disturbance using C-band SAR data. The proposed framework is based on the analysis of LAI change over a certain period that reliably indicates the transformations of a forest canopy. However, due to the significant non-linearity of the expected quantitative assessment response to the ratio of the final and initial LAI values, such an approach does not fully reflect the impact of the changes that have taken place. To restore a nonlinear numerical relationship using expert knowledge formalization, we applied the AHP mathematical apparatus coupled with the Gompertz model as an asymmetrical sigmoid function for the relationship approximation.
560
A. Kozlova et al.
The proposed framework has several advantages. The main one is the use of SAR satellite data. It provides the assessment of forest disturbance over large areas in cloudy and smoky conditions, excluding the use of optical data. Another advantage is the application of LAI as a key biophysical variable characterizing canopy structure. Furthermore, asymmetrical formalization of expert knowledge using the AHP and Gompertz model allows not only forest disturbance detecting that occurs but also its assessing in the context of the levels of impact. However, the resulting accuracy strongly depends on the amount and quality of LAI field measurements. Herein, we have explored the proposed framework for the assessment of short-term forest disturbance in the Pushcha-Voditsky forest. We have obtained the map of the spatial distribution of forest canopy change impact within the Pushcha-Vodytsia study area during 2017–2020. Thus, the proposed framework contributes to the decision-making process in forest management and can provide a quick response to abrupt forest disturbances in those atmospheric conditions when optical remote sensing is helpless.
References 1. Ball`ere, M., et al.: SAR data for tropical forest disturbance alerts in french guiana: Benefit over optical imagery. Remote Sens. Environ. 252, 112159 (2021). https:// doi.org/10.1016/j.rse.2020.112159 2. Balshi, M.S., et al.: The role of historical fire disturbance in the carbon dynamics of the pan-boreal region: a process-based analysis. J. Geophys. Res. Biogeosci. 112(G2) (2021). https://doi.org/10.1029/2006JG000380 3. Ben´ıtez, J., Delgado-Galv´ an, X., Guti´errez, J., Izquierdo, J.: Balancing consistency and expert judgment in AHP. Math. Comput. Model. 54(7), 1785–1790 (2011). https://doi.org/10.1016/j.mcm.2010.12.023 4. Boz´ oki, S., Dezs˝ o, L., Poesz, A., Temesi, J.: Inductive learning algorithms for complex systems modeling. Ann. Oper. Res. 211(1), 511–528 (2013). https://doi.org/ 10.1007/s10479-013-1328-1 5. Burrough, P., McDonell, R.: Principles of Geographical Information Systems. Oxford University Press, New York (1998) 6. Clark, D.B.: The role of disturbance in the regeneration of neotropical moist forests. Reprod. Ecol. Trop. For. Plants 7, 291–315 (1990) 7. Durieux, A.M., et al.: Monitoring forest disturbance using change detection on synthetic aperture radar imagery. In: Applications of Machine Learning, vol. 11139, p. 1113916. International Society for Optics and Photonics (2019). https://doi.org/ 10.1117/12.2528945 8. Filipponi, F.: Sentinel-1 GRD preprocessing workflow. In: Multidisciplinary Digital Publishing Institute Proceedings, vol. 18, p. 11 (2019) 9. Fournier, R.A., Hall, R.J. (eds.): Hemispherical Photography in Forest Science: Theory, Methods, Applications. MFE, vol. 28. Springer, Dordrecht (2017). https:// doi.org/10.1007/978-94-024-1098-3 10. Frazer, G.W., Canham, C.D., Lertzman, K.P.: Gap light analyzer (GLA), version 2.0: imaging software to extract canopy structure and gap light transmission indices from true-colour fisheye photographs, users manual and program documentation. Simon Fraser University, Burnaby, British Columbia, and The Institute of Ecosystem Studies, Millbrook, New York (1999)
Quantitative Assessment of Forest Disturbance with C-band SAR Data
561
11. Frison, P.L., et al.: Potential of sentinel-1 data for monitoring temperate mixed forest phenology. Remote Sens. 10(12), 2049 (2018). https://doi.org/10.3390/ rs10122049 12. Frolking, S., Palace, M.W., Clark, D., Chambers, J.Q., Shugart, H., Hurtt, G.C.: Forest disturbance and recovery: a general review in the context of spaceborne remote sensing of impacts on aboveground biomass and canopy structure. J. Geophys. Res. Biogeosci. 114(G2) (2009). https://doi.org/10.1029/2008JG000911 13. Gualtieri, J.A.: The support vector machine (SCM) algorithm for supervised classification of hyperspectral remote sensing data. Kernel Methods Remote Sens. Data Anal. 3, 51–83 (2009). https://doi.org/10.1002/9780470748992.ch3 14. Haralick, R.M., Shanmugam, K.: Textural feature for image classification. IEEE Trans. Syst. Man Cybern. 6, 610–621 (1973). https://doi.org/10.1109/TSMC.1973. 4309314 15. Hermosilla, T., Wulder, M.A., White, J.C., Coops, N.C., Hobart, G.W., Campbell, L.B.: Mass data processing of time series Landsat imagery: pixels to data products for forest monitoring. Int. J. Digit. Earth 9(11), 1035–1054 (2016). https://doi. org/10.1080/17538947.2016.1187673 16. Hirschmugl, M., Deutscher, J., Sobe, C., Bouvet, A., Mermoz, S., Schardt, M.: Use of SAR and optical time series for tropical forest disturbance mapping. Remote Sens. 12(4), 727 (2020). https://doi.org/10.3390/rs12040727 17. Jong, L.S., Pottier, E.: Polarimetric Radar Imaging from Basic to Applications (2009) 18. Juki´c, D., Kralik, G., Scitovski, R.: Least-squares fitting Gompertz curve. J. Comput. Appl. Math. 169(2), 359–375 (2004). https://doi.org/10.1016/j.cam.2003.12. 030 19. Lakyda, P., et al.: Impact of disturbances on the carbon cycle of forest ecosystems in Ukrainian Polissya. Forests 10(4), 337 (2019). https://doi.org/10.3390/f10040337 20. Lei, Y., Lucas, R., Siqueira, P., Schmidt, M., Treuhaft, R.: Detection of forest disturbance with spaceborne repeat-pass SAR interferometry. IEEE Trans. Geosci. Remote Sens. 56(4), 2424–2439 (2017). https://doi.org/10.1109/TGRS. 2017.2780158 21. Myroniuk, V., et al.: Tracking rates of forest disturbance and associated carbon loss in areas of illegal amber mining in Ukraine using Landsat time series. Remote Sens. 12(14), 2235 (2020). https://doi.org/10.3390/rs12142235 22. Ortiz-Urbina, E., Gonz´ alez-Pach´ on, J., Diaz-Balteiro, L.: Decision-making in forestry: a review of the hybridisation of multiple criteria and group decisionmaking methods. Forests 10(5), 375 (2019). https://doi.org/10.3390/f10050375 23. Parker, G.G.: Tamm review: leaf area index (LAI) is both a determinant and a consequence of important processes in vegetation canopies. Forest Ecol. Manage. 477, 118496 (2020). https://doi.org/10.1016/j.foreco.2020.118496 24. Saaty, T.L.: Decision making with the analytic hierarchy process. Int. J. Serv. Sci. 1(1), 83–98 (2008). https://doi.org/10.1504/IJSSCI.2008.017590 25. Satoh, D.: Model selection among growth curve models that have the same number of parameters. Cogent Math. Stat. 6(1), 1660503 (2019). https://doi.org/10.1080/ 25742558.2019.1660503 26. Schelhaas, M.J., Nabuurs, G.J., Schuck, A.: Natural disturbances in the European forests in the 19th and 20th centuries. Glob. Change Biol. 9(11), 1620–1633 (2003). https://doi.org/10.1046/j.1365-2486.2003.00684.x 27. Segura, M., Ray, D., Maroto, C.: Decision support systems for forest management: a comparative analysis and assessment. Comput. Electron. Agric. 101, 55–67 (2014). https://doi.org/10.1016/j.compag.2013.12.005
562
A. Kozlova et al.
28. Small, D., Schubert, A.: Guide to ASAR geocoding. ESA-ESRIN Technical Note RSL-ASAR-GC-AD, pp. 1–36 (2008) 29. Stankevich, S.A., Kozlova, A.A., Piestova, I.O., Lubskyi, M.S.: Leaf area index estimation of forest using sentinel-1 C-band SAR data. In: 2017 IEEE Microwaves, Radar and Remote Sensing Symposium (MRRS), pp. 253–256. IEEE (2017). https://doi.org/10.1109/MRRS.2017.8075075 30. Tadono, T., et al.: Generation of the 30 m-mesh global digital surface model by ALOS prism. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 41 (2016). https://doi.org/10.5194/isprs-archives-XLI-B4-157-2016 31. Takaku, J., Tadono, T., Doutsu, M., Ohgushi, F., Kai, H.: Updates of ‘AW3D30’ ALOS global digital surface model with other open access datasets. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 43, 183–189 (2020). https://doi.org/10. 5194/isprs-archives-XLIII-B4-2020-183-2020 32. Thom, D., Seidl, R.: Natural disturbance impacts on ecosystem services and biodiversity in temperate and boreal forests. Biol. Rev. 91(3), 760–781 (2016). https:// doi.org/10.1111/brv.12193 33. Wang, J., Wang, J., Zhou, H., Xiao, Z.: Detecting forest disturbance in Northeast China from GLASS LAI time series data using a dynamic model. Remote Sens. 9(12), 1293 (2017). https://doi.org/10.3390/rs9121293 34. Zhu, Z.: Change detection using Landsat time series: a review of frequencies, preprocessing, algorithms, and applications. ISPRS J. Photogramm. Remote. Sens. 130, 370–384 (2017). https://doi.org/10.1016/j.isprsjprs.2017.06.013
An Intelligent System for Providing Recommendations on the Web Development Learning Iryna Yurchuk(B)
and Mykyta Kutsenko
Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
Abstract. The rapid development of technology and the exponential increase of information available in the modern world is the cause of both new learning opportunities and new challenges in finding reliable sources of knowledge (books, abstracts, media lectures, etc.) which can be useful in the particular subject area and relevant in time. The study of web development is no exception. Web technologies are evolving rapidly as the market and businesses are dictating their demands. There is a need to create a system that would provide actions recommendations based on the person and his knowledge level taking in account multiple time-changing side parameters in the world: IT market state, available vacancies requirements, salaries and need for a certain type of specialist. The authors developed an intelligent system for providing person- and IT-market trends-oriented recommendations on web development learning. System includes data analysis, which involves data kindly provided by more than 60 international IT-companies as part of this research. Keywords: Intelligent system recommendations
1
· Web development learning · Providing
Introduction
Nowadays the Internet is a useful part of the human’s life as well as an informational empire. It can influent human internal state [7] as well as the state system of the country [11]. Its foundation is web development with all its components such as site concept development, page and website design, multimedia creation or collection, programming of functional tools, testing and deployment on hosting, search optimization and site maintenance. An important component of web development planning is the web development tools selection. One way is to take one of the broad list of ready-to-use web site builders available [10] and the other is to study using certain list of development tools by yourself. This research aims to develop the intelligent system for providing personoriented and IT-market trends-oriented recommendations at web development learning. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 563–572, 2022. https://doi.org/10.1007/978-3-030-82014-5_38
564
I. Yurchuk and M. Kutsenko
This system can be useful for those who have started studying web development or those who passed a study course before but have not been using it on practice for some time or those who are starting practice in web development and needs some knowledge upgrades. Besides, everyone who wants to assess their level of knowledge can use the system as a test for it.
2
Problem Statement
Let K be a set of pieces of knowledge in a certain domain and T , T = {T1 , . . . , Tn }, be a set of tasks that require skills and abilities on K. Assume that there are two maps f and g such that f (p) : T → S and g(t) : T → N , where S and N be sets of real non-negative and natural numbers, respectively, p (a person) and t (a time) be parameters of maps. The aim is to obtain the system of recommendations R(p, t) = r(f (T )(p) · g(T )(t)) depended on p and t parameters, where r be a correspondence between T and data of references that contain those pieces of knowledge and a person‘s skills and abilities set.
Fig. 1. Relation diagram: T is a set of tasks, f (p), g(t) and r are maps, t and p are parameters, and R(p, t) is a list of recommendations
For this intelligent system, the terms defining above, see Fig. 1, have the following values (attributes): – K = {“html”, “css”, “javascript”} – T = 40 (the size of T set) – A map f (p) is a testing such that for any p and an element of T there is a unique real non-negative value – A map g(t) is a rating of T based on data obtained from 60 international IT-companies during more than 10 years – A symbol · means element-by-element multiplication of the corresponding values of both sets S and N . It is obvious, that this operation is commutative (S · N = N · S) In [9] the authors researched experiential learning based on real-world examples and the abstraction of terminology using a sequential wizard-based logic as a method of web development learning. Obtained results demonstrated a high level of students’ performance, perception and acceptance. By H. G. Zhou, J. Li and J.L. Zhong, see [12], the efficiency of both a problem decomposition method
IS for Providing Recommendations on the Web Development Learning
565
and one problem with multiple solutions method for improving problem-solving skills and critical thinking abilities was demonstrated for Web application development pedagogy. The useful experience of the author to measure students’ attitudes towards web development (for example, confidence, motivation, selfreliance, context, etc.) during learning web development courses was revealed in [6]. In [5] the authors obtained the results for measuring the difficulty of text material pieces for Web page and Web site that can be used in the educational field to assist instructors to prepare appropriate materials for students as well as the applications of readability assessment in web development.
3
An Intelligent System and Its Three Components
In today’s world, users are divided into those who use a computer, tablet and mobile device. That’s why an adaptive user-friendly interface that provides effective user dialogue was created to satisfy all three types of users. Intelligent system works well on computers, tablets, and mobile devices as well and consists of a knowledge base, a make-recommendations mechanism and an intelligent interface. JavaScript programming language was selected for the functional logic implementation. During the user test, the system queries the server with database for a list of questions and provides user with tests. Then, at the end of the testing, user’s side responses are being checked with the correct answers on the server-side and after certain calculations, a list of recommendations is provided. 3.1
Knowledge Base
The knowledge representation technique is a frames net stored in the database is presented in JSON format. It contains of tasks, answers with their rates, and corresponding references to pieces of knowledge that provide the correct answers and explanations on the material. The main sources of the data forming this knowledge base were [1–3] as a base for html, css and javascript knowledge and [4,8] as additional sources for general material. One of the major national IT jobmarket platform provided the raw IT jobs data (vacancies, salaries, requirements, etc.). Using the knowledge base the assignment of a map r using their values list is provided, see Fig. 1. 3.2
Make-Recommendations Mechanism
The system makes three steps to provide recommendations on web development knowledge improvement: pre-decision of testing part, pre-decision of data analysis and their adjustment called as a make-recommendations algorithm.
566
I. Yurchuk and M. Kutsenko
Pre-decision of Testing Part. At the beginning, the user answers lots of test questions. It helps to understand the level of skills at the following phases of the web-development: site concept development, page layout in html and css, javascript programming. Every task has four types of answers: true, nearly true, nearly false and false. Such a gradation of answers aims to more accurately understand the level of knowledge, skills and abilities. A set S consists of the following elements: – – – –
1 corresponds to a “true” answer; 0.8 corresponds to a “nearly true” answer; 0.2 corresponds to a “nearly false” answer; −1 corresponds to a “false” answer.
In terms of denotation at Fig. 1, a testing part is f (p) that depends on a person skills and abilities and f (T )(p) = {f (T1 )(p), . . . , f (Tn )(p)}, where for any index i an image f (Ti )(p) is equal to one of the four values 1, 0.8, 0.2 or −1. There are two possibilities to get a test: a brief test with 10 questions during 5 min and a full test with 30 questions during 15 min. Pre-decision of Data Analysis. To take into account the requirements of the IT-market, the authors have analyzed different data collected from the year 2010 up to today: programing language rating, suggested positions, salaries, work experience (basic programming skills, other programming languages, frameworks, libraries and platforms). According to this data for every Tj , i = 1, . . . , n the trend up to the next six months is constructed. After that, a mechanism of a rating of {T1 , . . . , Tn } is provided. In fact, it is a map g(Ti )(t) that depends on a time t as a parameter. The result of this step is a set {g(T1 )(t), . . . , g(Tn )(t)} whose elements are natural numbers generated a rating of Ti according to analyzed data. Make-Recommendations Algorithm. As results of the previous two steps the beginning at of the make-recommendations algorithm there are two sets of numbers {f (T1 )(p), . . . , f (Tn )(p)} and {g(T1 )(t), . . . , g(Tn )(t)}. Then, a list of recommendations R(p, t) = {R1 , R2 , . . . , Rl }, l ≤ n, is formed using the following formula: Ri+1 = r(arg(min{{f (T1 )(p)·g(T1 )(t), . . . , f (Tn )(p)·g(Tn )(t)}\r−1 (Ri )})), (1) where a map r be a correspondence between sets T and R as a list at a knowledge base, R0 = ∅, r−1 (R0 ) = ∅, i = 1, . . . , l and l ≤ n. We have to remark that l can be equal to 0.5n to obtain an overhead of the recommendations list. There is an additional possibility to obtain a list of pieces of knowledge that are highly recommended to learn and a list of subjects in need of refreshing in memory. Let formulate the main properties of f (p), g(t) and r, see Fig. 1. We remind that T , N , S and R are finite sets as the sets of tasks, rating list, testing results and list of recommendations, respectively.
IS for Providing Recommendations on the Web Development Learning
567
Let prove some statements that demonstrate person- and time-(IT-market trends) orientations of a system. Lemma 1. For any p there exists a value , > 0, such that R(p, t) = R(p, t+). Proof. Without losing generality, let consider some value of a parameter p and assume that for t1 and t2 the lists R1 and R2 have the following representations R1 (p, t1 ) = {R11 , . . . , Rl11 } and R2 (p, t2 ) = {R12 , . . . , Rl22 }. If l1 = l2 , a lemma is proved and = |t2 − t1 |. Let consider a case of l1 = l2 . Then there are some possibilities: Case 1: there are two values i and j such that Rl1i = Rl2j . Then a lemma is proved and = |t2 − t1 |. Case 2: for any i it is true that Rl1i = Rl2i . By using formula (1) let obtain a list R3 (p, t3 ). If for any i it is true that Rl1i = Rl2i = Rl3i , we continue the process = Rlsi . Then a lemma is proved and find a value s such that Rl1i = . . . = Rls−1 i and = |t1 − ts |. In that a map g(t) is not a constant, a value of an index s has to have existed and g(t) = g(t3 ), t = t3 . Lemma 2. For a fix t and any pair p1 and p2 , p1 = p2 , it is true either R(p1 , t) = R(p2 , t) or R(p1 , t) = R(p2 , t). Proof. Let consider R1 (p1 , t) = {R11 , . . . , Rl11 } and R2 (p2 , t) = {R12 , . . . , Rl22 } which are lists of recommendations for p1 and p2 at a fix time t. It is obvious that f1 (t) = f2 (t), where f1 (t) and f2 (t) correspond to p1 and p2 , respectively. According to formula (1), we can obtain the following results: 1) R01 = R02 = ∅; 2) for any i, i = 1, . . . , min{l1 , l2 } it is true either Ri1 = Ri2 or R11 = R12 . In a case of Ri1 = Ri2 , according to formula (1), it is true that g(Ti )(p1 ) = g(Ti )(p2 ); 3) if l1 = l2 then there is a value i such that min{l1 , l2 } < i < max{l1 , l2 } it is true either Ri1 = ∅ and Ri2 = ∅ or Ri1 = ∅ and Ri2 = ∅. We have to remark that two persons p1 and p2 are identical for IT-market at a time t if R(p1 , t) = R(p2 , t). Let assume that two persons p1 and p2 at a time t are such that R(p1 , t) = R(p2 , t). According to Lemma 1, for p1 and p2 there exist values 1 and 2 , 1 , 2 > 0 such that R(p1 , t) = R(p1 , t + 1 ) and R(p2 , t) = R(p2 , t + 2 ). If for two persons p1 and p2 at t it is true that R(p1 , t) = R(p2 , t) and there exists 1 = 2 = such that R(p1 , t + ) = R(p2 , t + ), then they will be very likely identical. Such type of identity can be interpreted as a rivalry.
568
I. Yurchuk and M. Kutsenko
Fig. 2. An interface state diagram consists of six categories of pages: a home page, a page with lists of tests, a selection test entry page, a testing page, a page with the last test question and a test result page
3.3
An Intelligent Interface
An interface state diagram is shown in Fig. 2. It consists of six categories of pages: – home page contains general information such as introduction, tests description and possibility to take the lists of tests; – page with lists of tests contains also their descriptions; – selection test entry page contains its description and a button to start it;
IS for Providing Recommendations on the Web Development Learning
569
– testing page consists of a question number, the text of a question and a list of answers; – page with the last test question consists of a question number, the text of a question, a list of answers and a result; – test result page consists of the percentage of a test result, names of blocks, names of topics (lists of recommendations), a list with information to study lists of salary changes and lists with things to get higher position. We have to remark that all filial pages contains button to go to the home page. On the home page of the system there is a header with a logo and a navigation menu; user greeting unit; a block describing what the system can and how it will help the user; the block in which it is possible to pass to testing; and a footer that displays the header, see Fig. 3.
Fig. 3. Two pictures of screens: on the left hand there is a user greeting unit and on the right hand there is a block describing the system’s applicability
On Fig. 4 there are examples of pages: two blocks for describing brief information about how the test will take place and the beginning of a brief test. On the system’s testing page, there is a header in which there is a logo and navigation menu; a block describing brief information about how the test will
Fig. 4. Two pictures of screens: on the left hand there is a block for describing brief information about how the test will take place and on the right hand there is a block of the beginning of a brief test
570
I. Yurchuk and M. Kutsenko
take place, the number of questions, and the time for which it can be performed; a test unit that displays questions and results; and a footer that displays the header. In Fig. 5, there is an example a testing page. It contains a question number 2, the text of a question, a list of answers and button to continue testing by going to next task.
Fig. 5. An example of testing page. It consists of a question number, the text of a question and a list of answers
As an intelligent system work result the following artifacts are presented: a list of pieces of knowledge that are highly recommended to learn, a list of subjects in need of refreshing in memory and recommendations list, see Fig. 6.
Fig. 6. A picture of the screen that contains a list of themes that highly recommended to learn, a list of themes in need of refreshing in memory and a recommendations list
IS for Providing Recommendations on the Web Development Learning
4
571
Survey on the System’s Quality
Nowadays, the quality of recommendations of the system was tested using a survey among colleagues, classmates and students studying at web development courses, in a representative sample of size of 70 people. The respondents were asked to rate the system as personal-oriented. We have to remark that assessment of its IT-market trends-oriented is in process since there is a need to observe respondents who, after testing, followed the recommendations, mastered the proposed knowledge and entered the market as a job seeker. In Table 1 the results of the survey are presented. According to it, 60 responders find the proposed recommendations as informative for improving their skills. The system interface and context of test tasks were highly rated however it is desirable to increase their complexity. We have to remark that during survey the possibility to single out responders which passed a brief test was not provided. It can be one of the reason of a fact that more than 40% of respondents considered testing tasks as not complex. Table 1. The results of the survey on the quality of recommendations (70 respondents) Survey questions
How clear is the interface for you to use the system?
The 90 percent of respondents that answered positively
5
How did you like the design of the system?
How interesting are the questions you passed during the test?
How difficult were the questions for you during the test?
Are the links provided as recommendation informative?
94
80
66
86
Conclusion and Further Research
In the rapid IT job-market development and constantly raising demand of high quality specialists, it is necessary to create systems that can help to assess the level of knowledge and provide relevant pieces of advice for further actions base not only on person’s skills willing to develop, but also taking in account current market demands and forecasted trends. An intelligent system presented in this research provides a personal assessment of skills and abilities on web development as well as IT market trends in terms of the available number of vacancies for this type of developer, their skills
572
I. Yurchuk and M. Kutsenko
and salary offers. A survey on its quality illustrates high enough interest in this type of system and its usefulness. In further research, the system will be expanded to have a more personalized approach to the survey (gender, age, education, location, etc.) more accurate forecast in areas of improvement based on more accurate analysis of data provided by IT companies. There is also the possibility to extend it for predicting a rivalry at IT-market using a database of a system. Acknowledgements. The authors are sincerely grateful to the analytics department of the Internet resource DOU.UA (https://dou.ua) for the provided information and interest in this work.
References 1. Anquetil, R.: Fundamental Concepts for Web Development: HTML5, CSS3, JavaScript and Much More! p. 276. Independently Published (2019) 2. Duckett, J.: Web Design with HTML, CSS, JavaScript and jQuery Set, p. 1152. Wiley, New York (2014) 3. Haverbeke, M.: Eloquent JavaScript, p. 472. No Starch Press, San Francisco (2018) 4. Krug, S.: Don’t Make Me Think: A Common Sense Approach to Web Usability, p. 216. New Riders, Berkeley (2013) 5. Lau, T., King, I.: Bilingual web page and site readability assessment. In: Proceedings of the 15th International Conference on World Wide Web, pp. 993–994 (2006). https://doi.org/10.1145/1135777.1135981 6. Liang, Z.: Design of a web development attitudes survey. In: IEEE Conference on Teaching, Assessment, and Learning for Engineering, TALE 2019, pp. 1–4 (2019). https://doi.org/10.1109/TALE48000.2019.9225877 7. Michel, M.C.K., King, M.C.: Cyber influence of human behavior: personal and national security, privacy, and fraud awareness to prevent harm. In: IEEE International Symposium on Technology and Society, ISTAS, pp. 1–7. IEEE (2019). https://doi.org/10.1109/ISTAS48451.2019.8938009 8. Simpson, K.: You Don’t Know JS: Up and Going, p. 88. O’Reilly Media, Sebastopol, California (2015) 9. Tzafilkou, K., Protogeros, N., Chouliara, A.: Experiential learning in web development courses: examining students’ performance, perception and acceptance. Educ. Inf. Technol. 25, 5687–5701 (2020). https://doi.org/10.1007/s10639-020-10211-6 10. Wilson, J.L.: The best web site builders for 2021 (2020). https://www.pcmag.com/ picks/the-best-website-builders 11. Zachary, G.P.: Digital manipulation and the future of electoral democracy in the U.S. In: IEEE Transactions on Technology and Society, vol. 1, pp. 104–112. IEEE (2020). https://doi.org/10.1109/TTS.2020.2992666 12. Zhou, H.G., Li, J., Zhong, J.: Cultivating personal capabilities based on problembased learning: a practice in web development. In: 15th International Conference on Computer Science and Education (ICCSE), pp. 379–382 (2020). https://doi. org/10.1109/ICCSE49874.2020.9201808
Real-Time Sensing, Reasoning and Adaptation for Computer Vision Systems Volodymyr Hrytsyk(B)
and Mariia Nazarkevych
National University “Lvivska politechnika”, Lviv, Ukraine [email protected]
Abstract. Automatically recognition and classification of biological objects under microscope methods are shown in paper. Problem of separated of white-black and color images is studied. Method of separation of different type of objects (visual diapason of specter) with compare results is shown in the paper. Quality of segmentation methods analyses is presented in the paper. Schemes and table results of segmentation are exist. Methods of pattern recognition applicability for Computer Vision Systems of analysis and pattern recognition scenes in the visual spectrum are studied in the paper. The methods and algorithms can be used in Realtime Sensing, white-black and color patterns, reasoning and adaptation for Computer Vision Systems too. Example of such systems is the glasses for people with visual impairments; when the camera mounted in glasses receives and transmits environment data, and the contact plate with electrical leads via e-pulse transmits data to the eye retina. Author analyzed several pattern recognition methods that will allow to process data of the environment for the brain. This will make the visually impaired persons with sub reality vision better orientation in environment. Theoretical basic, algorithms and their compared for apply is presented in paper.
Keywords: Pattern recognition
1
· Segmentation · Separation
Introduction
The last 50 years in our technology-oriented society has become such a situation in which more and more people and organizations are engaged in processing information and less – in the processing of material objects. There are tendencies that by the year 2030 the machines will disappear dozens of professions that serve data input [9,13]. Industry 4.0 leads us to another informational explosion. And the human brain and society itself can not control this information explosion we need more advanced information systems that will carry out the lion’s share of information processing. Information it is a key element of the decision-making c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 573–585, 2022. https://doi.org/10.1007/978-3-030-82014-5_39
574
V. Hrytsyk and M. Nazarkevych
process now and will be in the future. In addition, quantity of varied and different levels of complexity of information that our world generates is spin-off, so we need adaptive information input systems. That is, such that humans will not to interfere with their work when changing external conditions - the systems themselves must be reconfigured.
2
Literature Review
Computer technology is so firmly rooted in our society that today it is impossible to imagine any kind of activity is not connected, one way or another, with a computer [6,13,15,16]. Summarizing the survey of researches and implementation of intellectual IT in the EU and the US: Through FP7 and Horizon 2020, the EU funded the development of sophisticated techniques for understanding audiovisual projections throughout life for typical and non-typical populations. It is believed that robotic input systems that are integrated into the body with the hands and feet are the near future of computer vision. And this will require the development of models and methods for the synthesis of methods for the reception of visual spectrum data, obtained in real time [5,7,10,16,17,30]. Thus, the actual problem of the development of artificial intelligence is the development of the principles of computer perception of the outside world through the understanding of video data [1,2,4,9,11,22,24]. In particular, the development of Cyber-Physical Systems perception of the external world by people with disabilities in the visual spectrum is relevant today [18,23,28]. This applies to both congenital (e.g., congenital blindness) and acquired conditions (for example, with age or as a result of an accident). Today, implant technologies allow sensors to transmit information to the retina of the eye, so researching the methods and ideas of allocating a useful signal in real time is extremely relevant. Considering that the human brain receives more than 80% of the information from the visual spectrum, this section is devoted to one of the most important problems arising in connection with the creation of modern information systems – this is the automation of the pattern recognition process. In particular, different approaches to choosing a threshold in the segmentation problem will be considered in this part. The results of an objective assessment of segmentation methods will be shown in the paper.
3 3.1
Materials and Methods Purposes of an Analysis of Images and Basic Definitions
The basic concepts of the theory of images are objects and relations. Definition 1. Objects in the theory of images are components, configurations, ideal and deformed images, classes of images.
Real-Time Sensing, Reasoning and Adaptation for Computer Vision Systems
575
Definition 2. The relations in the theory of images are given in the form of transformations of similarity, combinatorial relations, rules of identification and mechanisms of deformation. Objects are divided by levels, where components occupying the lower level, configurations are the closest to it upper, etc. In the hierarchical system of images may contain many levels. In the synthesis of images, the advancement comes from the lower to upper level of the system. The analysis of images is a reverse process: the upper level of the image is selected as the starting point of the analysis and an attempt is made to dissociate the object-image into objects belonging to the lower levels (decomposition). In the development of automatic recognition systems, analysis and synthesis are related tasks, because when system identifying, it distinguish the characteristic properties of an object. And sets of these properties form objects. Therefore, the problem of recognition can be formulated as follows Case 1 (Image reconstruction). This problem is to find the reflection from the algebra of deformed images F D (vector of the input image) into the algebra of ideal images of Fs (vector-standard). It is assumed that the reflection FD → Fs restores the ideal image I, which as result of the influence of the deformation mechanism D has been converted into the observed object ID. In over case, deformation it is the influence of affine transformations, noise caused by electromagnetic oscillations (both external and internal), and defects of optics... For example, one pixel in a modern color chamber is represented by 2563 combinations of colors, with any change in lighting, whether the sun went into the cloud (if in the closed room, then the voltage drop in the light bulb), or any of the affine transformations. The standard with the test (inline) image will never match, and the boundaries of objects will necessarily change. Case 2 (Image analysis). For an image I is given, we need to find the configuration f that generates I. This task involves defining the components and combinatory relations of the configuration f , so that the reflection is desired looks like F → f (R). In the case of time-type images of process types, for example, we have to solve the tasks of segmentation, as well as to find the types of signals corresponding to separate segments on the time axis. The obvious and easiest solution of the problem of recognition is to apply a series of simple tests to the individual representations of the images to distinguish the characteristics of each class. The set of these tests should distinguish between all permissible images from different classes. This way likes very easy but we need remember: general theory to solving recognition problem (which tests of global set is better for the pattern) does not exist. A very limited number or bad selection of tests will not allow getting the characteristics of the images represented to be recognized, sufficient for assigning them to the corresponding classes. Many tests, on the other hand, unreasonably complicate the calculations carried out in the process of further analysis. There is no general rule for obtaining some benchmarks that help determine the set of such tests. This approach is highly dependent on the experience and technical
576
V. Hrytsyk and M. Nazarkevych
intuition of the developer, and therefore often does not provide a satisfactory solution to the problems of pattern recognition in practical activities. Today, the task of automatic selection of rules is on the agenda. 3.2
Main Tasks Concerning the Development of a Recognition Systems
The tasks we need to solve during the construction of an automatic pattern recognition system can usually be attributed to several main areas. The first one is related to the submission of output data obtained as measurement results for the object to be identified. This is a sensitivity problem. Each measured value is some characteristic of an image or an object. Suppose, for example, that the images are alphanumeric characters. In this case, the measuring retina can be successfully used in the sensor. If the retina consists of n elements, then the measurement results can be represented as a vector of measurements or an image vector x = (x1 , x2 , . . . , xn )
(1)
here each element xi takes, for example, the value 1, if the image of the character passes through the i-th part of the retina, and the value 0 is otherwise. In the following statement, we will call vectors of images just images in those cases where this does not lead to a change in meaning. Equivalent entry of 1 is x = (x1 , x2 , . . . , xn ), where the dot denotes transposition, will also be used in the text. Vectors of images contain all measurable information about images. Next are the tasks of classification. So, knowing the goal, we will return to getting a qualitative informative vector. 3.3
Perception of the Spectrum Problem
An important task is the representation of the real world in the space of perception by the system. Consider, for example, human eyesight and technical vision. The light range perceived by the human’s eye is close to a billion. However, in the daytime, we do not see stars in the sky, although the absolute contrast of the sky – stars are not more than ten thousand. The fact that the contrast sensitivity of the human eye is only 2% [19], so the noticeable absolute contrast does not exceed 50. Individual parts of the billion-per-person perception range can only be considered in turn, adapting to each section of illumination. By watching the locality, a person turns one viewpoint from one object to another. If the object is bright, then the person automatically sets the filter (s) (swears or reduces the pupil). Observing the object in the shadow, the observer installs an additional filter (protects his eyes from the dazzling sun with his palm). As we know the contrast (absolute values) increases, depending on the state of the object in tens of thousands of times, and when the sun hits up to 1’000’000 times. As a result of glare the water can reach 106, and equivalent illumination
Real-Time Sensing, Reasoning and Adaptation for Computer Vision Systems
577
of the disk of the sun is 108 lux (i.e., 100 000 000 lux). The illumination of objects in the shade can be reduced to 100 lux or less. Today as a human eye, one single input camera can not be observed simultaneously for several objects, the illumination of which is different tens of thousands of times. In such cases, the inevitable loss of information in some areas. And the task of the designer is to extract as much information as possible from such situations. 3.4
Main Approaches to Objects Selection on Complexity Area
For our task, we investigated some methods from several main groups of threshold methods that work effectively in real time 1) Global threshold methods; 2) Local threshold methods; 3) Methods with several thresholds; 4) Methods based on shades of gray and histograms; 5) Methods based on entropy 6) Methods based on clustering; 7) Methods based on pixel correlation. 3.5
Binaryization with Lower Threshold
Binaryization with a lower threshold method refers to a group of image segmentation methods based on global thresholds. The essence of the method is that the global brightness threshold is selected as a certain constant, and, depending on the ratio of this constant to the local brightness values for each pixel, the binary value of this pixel. This method is the most simple and widespread. The method is effective for standard conditions. We describe it as follows: 0, if r+g+b (axy ) ≤ L new 3 (2) axy = (axy ) > L 1, if r+g+b 3 = is the resulting value of the brightness of the pixel; axy is input here, anew xy value of pixel brightness; L(const) ∈ [0, 255] is the global lighting level; r, g, b are the original values of the red, green, and blue component of the axy pixel color. 3.6
Binarization Methods with Local Threshold
The method is based on the calculation of the local threshold of illumination. The idea is to align the threshold of brightness of binarization from point to point based on the deviation of the local mean brightness value (the value calculated for each pixel based on the brightness values of itself and its neighbors), from the local (calculated for only one pixel) in the given mask [14,20]. That is, the binary representation of the pixel is computed as follows: 0, if B(x, y) ≤ L (3) anew xy = 1, if B(x, y) > L where B(x, y) ∈ [0, 255] = r+g+b (axy ), L ∈ [0, 255] = mw×w (x, y)+k∗sw×w (x, y) 3 is the local brightness threshold for pixel axy in the w×w district; mw×w (x, y) ∈
578
V. Hrytsyk and M. Nazarkevych
[0, 255] =
w×w 1
(B(x,y) w×w
is the average brightness value in the w×w area of the w×w 1 2 pixel (x,y); sw×w (x, y) = 2 w×w i=1 (B(x, y) − mw×w (x, y)) is the mean square deviation of the sample in area of the pixel; k(const) = −0, 2 for objects that are more likely to be represented in black (namely if B(x, y) ≤ 127), and k = 0, 2 for objects that are more likely to be white B(x, y) > 127; w(const) is the mask size of district, for example, 15 [14]. 3.7
Methods with Several Thresholds
In general, when applying a single threshold for an arbitrary image f and an operations of the transformation ϕ and real number t as input set, we can choose the set of points Sϕ , such that for all set points, the brightness of transformed image ϕf will be not less then t. That is, the relationships are executed: ϕ(f ) ≤ t, ϕ(f ) < t, ϕ(f ) > t. To apply multiple thresholds, you can select subsets by fixing the configurations in which, after applying transformation operations t1 ≤ t2 is executed. Then choose a subset for which next inequality is true: t1 ≤ ϕf ≤ t2 . In a partial case, we can search for a set with one threshold (for example, a global threshold). And when we find a set corresponding to the threshold, then we pass with a lower/higher threshold around the object (the second threshold can be local, or work on an arbitrary other principle). For example: ⎧ ⎪ ⎨0, if B(x, y) ≤ L1 = (4) anew 1, if B(x, y) > L2 xy ⎪ ⎩ 0, if B(x, y) ≥ L3 There are possible and other variations with the threshold values. 3.8
Methods with Thresholds Preprocessing
This method changes brightness values of background pixel: 0, if B(x, y) > L1 new axy = f (x, y), f (y, x) if Bx, y ≤ L 3.9
(5)
Multilevel Thresholds Transformation
This transformation method creates non binary image: ⎧ ⎪ 1, if B(x, y) ∈ S1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎨2, if B(x, y) ∈ S2 new axy = ... ⎪ ⎪ ⎪n, if B(x, y) ∈ Sn ⎪ ⎪ ⎪ ⎩0, in any other cases here, Sn is the set of pixels with likely brightness.
(6)
Real-Time Sensing, Reasoning and Adaptation for Computer Vision Systems
3.10
579
Methods Based on Shades of Gray and Histograms
Methods using histograms are effective when compared with other image segmentation methods, because they require only one pass through pixels. The Method of Minimal Value. The method of setting the threshold based on the minimum between two maxima. Maximums are the maximum values of the distribution of the brightness of the background image and the object itself on the histogram. (7) t = min h[bij ], here, t is the threshold value; bij is the pixel brightness; (h[bij ] is the value of the distribution of brightness in the histogram; min(h[bij ]) is the minimum value of brightness in the distribution of histograms between the two maxima; Method of Average Value. The essence of the method in search for average value is based on histogram of the image [8]. n h[bij ] (8) t = i=1 n here, t is threshold value; n is quantity histogram elements; h[bij ] is value of distribution of brightness in the histogram; Otsu Method. The method [3,10,21,27] uses a histogram of the distribution of the brightness values of the pixels of the image. The essence of the Otsu method is to set the threshold between classes on a histogram in such a way that each of them is as “dense” as possible. If expressed in a mathematical language, then this is reduced to minimizing the intra-class variance, which is defined as the weighted sum of dispersions of the two classes: 2 (L) = w1 (L)σ12 (L) + w2 (L)σ22 (L), σw
(9)
here, weight wi is the probability of two classes separated by the threshold L; σi2 is the dispersion of these classes. Otsu proved that minimizing the dispersion inside the class is equivalent to maximizing the dispersion between classes, which can be expressed through the probability wi and the arithmetic mean ηi : 2 (L) = w1 (L)w2 [η1 (L) − η2 (L)]2 . σb2 (L) = σ 2 − σw
(10)
At first we need to build histogram p(l) of image and determine the entries rate N (l) of every brightness level at image G(x, y). We are looking for total brightness NT of image pixels: max(G)
N (T ) =
i=0
p(i)
(11)
580
V. Hrytsyk and M. Nazarkevych
Then for each value of the halfton (threshold) L = (1, max(G)) we perform the following: L−1 w1 (L) = L−1 η1 (L) =
i ∗ pi = NT ∗ wL i=0
i=0
p(i)
NT
=
L−1
N (i);
w2 (L) = 1 − w1 (L)
(12)
i=0
L−1
i . . . N (i) ; w1 (L)
i=0
η2 (L) =
ηT − η1 (L) . . . w1 (L) (13) w2 (L)
2 σb2 (L) = σ 2 − σw (L) = w1 (L)w2 [η1 (L) − η2 (L)]2 .
(14)
The wanted threshold is equal to L, at which σb2 (L) is equal to maximum: L = argmaxL σb2 (L). Binary pixel representation is calculated as follows: 0, if B(x, y) ≤ L new axy = 1, if B(x, y) > L 3.11
(15)
(16)
Triangle Method
The method uses a histogram of the distribution of brightness values in the image. We build line s on the histogram from minimal brightness value bmin to maximum brightness value bmax . The threshold will be defined the element of the histogram, the distance from which to s is greatest [29]: L = argmaxb(i) d,
(17)
where L is the threshold value; d is the distance from histogram value b(i) to line s. Binarization procedure we realize by the standard formula: 0, if B(x, y) ≤ L new axy = (18) 1, if B(x, y) > L 3.12
The Threshold Based on the Gradient of Brightness Determining Method
Let pixels of the image can be divided into two sets (two classes) are pixels belonging to the set of objects and pixels belonging to the set of background. Then, the algorithm for calculating the threshold is the following two steps:
Real-Time Sensing, Reasoning and Adaptation for Computer Vision Systems
581
1. The brightness gradient module for each pixel is determined: G(m, n) = max |Gm (m, n)|, |Gn (m, n)|
(19)
here, Gm (m, n) = f (m + 1, n) − f (m − 1, n) Gm (m, n) = (m, n + 1) − f (m, n − 1). 2. Threshold is calculated as follows: M −1 N −1 n=0 f (m, n)G(m, n) t = m=0 M −1 N −1 NT ∗ wL m=0 n=0 G(m, n)
(20)
here, t is the threshold value. 3.13
Methods of Binarization Using Color Brightness Entropy (Yen’s Method)
This method refers to methods that use the entropy of the distribution of brightness of colors in the image. Yen’s method looks at the object on the images and the background on which this object is located, as two different sources of visual information. And the value of brightness, in which the sum of these two entropies reaches their maximum, is considered the optimum threshold for image segmentation [26]. To begin, we need to calculate the histogram p(l) of the image and the input frequency N (l) of each brightness level of the image G(x, y). We also look for the total brightness of NT image pixels: max(G)
NT =
p(i)
(21)
i=0
We are building auxiliary normalized histograms: Pnorm (i) =
p(i) , NT
i−1 (i − 1) + Pnorm (i), PnormC (i) = PnormC
Pnorm (i) = Pnorm (i − 1) + Pnorm (i)2
Pnorm (i) = Pnorm (i + 1) + Pnorm (i + 1)2 We find the entropy of the object and its background: Cf (T ) = − log PnormC (i) × (1 − PnormC (i))
Cb (T ) = − log Pnorm (i) × (Pnorm (i))
(22)
582
V. Hrytsyk and M. Nazarkevych
Determine the value of i, when the sum of these entropies is maximal: L = argmaxi Cb (T ) + Cf (T )
(23)
We use this value as the threshold of brightness and binaries the image: 0. if B(x, y) ≤ L new axy = 1, if B(x, y) > L 3.14
(24)
Methods Are Based on Clusterization Procedure
Essence of cluster analysis refers to the division of objects into groups with similar properties. In order to reduce the task of segmentation to the task of clustering, it is enough to set the display of points of the image in some area of the signs and to enter measure of proximity in this space of signs. As characteristics of a dot in the image, you can use the representation of its color in some color space, an example of a metric – measures of proximity can be the Euclidean distance between vectors in the space of signs. Then the result of clustering will be the quantization of the color (or its saturation, brightness) for the image. By specifying the space of signs, we can use any methods of cluster analysis, such as k-medium, EM algorithm or other. Clustering method of k-medium it is an iterative method that is used to divide images into k clusters. The distance is taken as the sum of the squares or absolute values of the differences between the pixel and the center of the cluster. The difference is usually based on the color, brightness, texture and location of the pixel, or on a weighted sum of these factors [5,25].
4
Experiment, Results and Discussions
Comparative results of experiments are shown in the corresponding tables. It also shows valid errors for each method. The results obtained in the work will allow you to more effectively choose methods for specific conditions. Authors used MSE evaluation as in [12] (Tables 1 and 2). Table 1. Quality. 1-st experiment results Experiment number Binarization with lower threshold method Niblak method
1
2
9275
3
1032
4
2635
5 1916
6 1865
7 2128
8 1089
13581 16124 16388 16856 14841 11073 15014 3002
3406
9 2799
Error 2921 ±248
3724 11600 ±412 3714
3887 ±211
Bernsen
2494
2357
4869
2790
4891
Yen
7049
9618 10518
7915
9765 10141
7360 11302 10295 ±352
Minima
7946
4351
2562
2791
1944
2248
4382
7216
3835 ±339
Average
4261
5563
6783
4598
5999
6360
4272
7322
6450 ±216
Triangle
7005
8723
9541
7625
9726
9952
7281 10713
9461 ±285
Otsu
7670
984
2421
1871
1831
1952
2519
2675 ±231
2619
Real-Time Sensing, Reasoning and Adaptation for Computer Vision Systems
583
Errors in the same image as obtained from the evaluation of various methods are shown in the Summary results of experiments for comparison tables. Thus, the paper shows the starting point for an adaptive system of perception of the environment based on the idea of categorical architecture. All experiments are performed in the visible range of the spectrum, but according to this principle, idea can be extended to other ranges of the spectrum, for example, in the sound range of the spectrum. The work shows the possibility of choosing the best approach depending on environmental conditions. The selection criterion is the objective values obtained as a result of experiments. An important advantage is the ability to avoid calculations that are constantly complicated. Table 2. Quality. 2-nd experiment results Experiment number
1
2
3
4
5
Error
Binarization with lower 0.00021 0.00028 0.00010 0.00013 0.00013 ±0.00005 threshold method Niblak method
0.01016 0.02023 0.00565 0.00999 0.01040 ±0.00005
Bernsen
0.02317 0.04354 0.00744 0.01406 0.01215 ±0.00005
Yen
0.01490 0.02791 0.00907 0.01356 0.01215 ±0.00005
Minima
0.04044 0.03942 0.01435 0.03222 0.03137 ±0.00005
Average
0.00243 0.00545 0.00300 0.00403 0.00382 ±0.00005
Triangle
0.01412 0.02350 0.00520 0.00906 0.00882 ±0.00005
Otsu
0.01503 0.02484 0.00684 0.01283 0.01215 ±0.00005
Therefore, we have conclude that histogram methods of binarization were the best in our study. However, for all methods, the proximity of the input image to the reference signal plays a very important role, as well as the presence of side “noises” on it. The first problem, it can be solved by a longer and multitasking system. The second problem is relatively easy to solve by imposing additional filters on the image. In the future, the authors propose to expand the various methods in order to increase the range of quality solutions. In addition, the authors consider exploring the application of generalization to approaches.
5
Conclusions
Methods and algorithms applicability for Computer Vision Systems of analysis and pattern recognition scenes in the visual spectrum will be studied in the paragraph. Example of such systems is the glasses for people with visual impairments; when the camera mounted in glasses receives and transmits environment data, and the contact plate with electrical leads via e-pulse transmits data to the eye retina. Author will analyze several pattern recognition methods that will allow to process data of the environment for brain. This approach will make the
584
V. Hrytsyk and M. Nazarkevych
visually impaired persons with sub reality vision. Theoretical basic of algorithms and their compared for apply is presented in paper. The authors examined segmentation methods based on different approaches. The results of the study allow adaptively (automatically, without human intervention) to adapt the system to external conditions. The study increased the accuracy of segmentation in specific conditions by creating the possibility of automatic selection of the optimal segmentation method. Comparative results for use in certain conditions are reflected in the tables. Global threshold methods; local threshold methods; methods with several thresholds; methods based on shades of gray and histograms; methods based on entropy; methods based on clustering; methods based on pixel correlation are considered in the paper. Thus, the authors tried both to take into account different possibilities and to find the most effective approaches to adaptive segmentation. So, the authors tried both to take into account different possibilities and to find the most effective approaches to adaptive segmentation. The experiments show that despite the popularity and proliferation of artificial neural networks, the goal of segmentation can be achieved at much lower costs.
References 1. Abbas, M., El-Zoghabi, A., Shoukry, A.: Denmune: Density peak based clustering using mutual nearest neighbors. Pattern Recogn. 109, 11–15 (2021). Article number 107589. https://doi.org/10.1016/j.patcog.2020.107589 2. Bayro-Corrochano, E.: Geometric Algebra Applications Vol. I. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-74830-6 3. Birda, T.: Otsu method, codding (2009). https://www.codeproject.com/Articles/ 38319/Famous-Otsu-Thresholding-in-C 4. Brilakis, I., Haas, C.: Infrastructure Computer Vision, p. 390. ButterworthHeinemann (2020) 5. Cordis, R.: Robots of tomorrow with intelligent visual capabilities. Research*eu Results Mag., no. 62, art. no. 38 (May 2017) 6. Dronyuk, I., Nazarkevych, M.: Development of printed packaging protection technology by means of back-ground nets. In: 2009 10th International Conference-The Experience of Designing and Application of CAD Systems in Microelectronics, vol. 26, pp. 401–403. IEEE (2009) 7. Duda, R., Hart, P., Stork, D.: Pattern Classification, 2nd edn., p. 738. Wiley (1999) 8. Gonzalez, R.: Digital Image Processing, p. 976. Pearson Hall (2008). http://sdeuoc. ac.in/sites/default/files/sde videos 9. Hrytsyk, V.: Future of artificial intelligence: treats and possibility. Inf. Theor. Appl. 24(1), 91–99 (2017). http://www.foibg.com/ijita/vol24/ijita24-01-p07.pdf 10. Hrytsyk, V.: Study methods of image segmentation for intelligent surveillance systems. In: Computational Linguistics and Intelligent Systems, vol. 2, pp. 171–176 (2018). http://ena.lp.edu.ua:8080/xmlui/handle/ntb/42565?show=full 11. Hrytsyk, V., Grondzal, A., Bilenkyj, A.: Augmented reality for people with disabilities, pp. 188–191 (2015). https://doi.org/10.1109/STC-CSIT.2015.7325462 12. Hrytsyk, V., Pelykh, N.: Classification problem of biological objects. Bull. Nat. Univ. “Lvivska Politechnika” Comput. Sci. Inf. Technol. 650, 100–103 (2009) 13. Kaku, M.: Hyperspace: A Scientific Odyssey Through Parallel Universes, Time Warps, and the Tenth Dimension, p. 384 (2016)
Real-Time Sensing, Reasoning and Adaptation for Computer Vision Systems
585
14. Korzynska, A., Roszkowiak, L., Lopez, C.e.a.: Validation of various adaptive threshold methods of segmentation applied to follicular lymphoma digital images stained with 3,3’ - Diaminobenzidine and Haematoxylin. Diagn. Pathol. 8(1), 1–21 (2013). https://doi.org/10.1186/1746-1596-8-48 15. Krak, I., Barmak, O., Manziuk, E.: Using visual analytics to develop human and machine-centric models: a review of approaches and proposed information technology. Comput. Intell., 1–26 (2020). https://doi.org/10.1111/coin.12289 16. Luque, A., Carrasco, A., Mart´ın, A., Heras, A.: The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recogn. 91, 216–231 (2019). https://doi.org/10.1016/j.patcog.2019.02.023 17. Madala, H., Ivakhnenko, A.: Clusterization and recognition, Chap. 5. In: Inductive Learning Algorithms for Complex Systems Modeling, p. 380. CRC Press (1994) 18. Nazarkevych, M., Logoyda, M., Troyan, O., Vozniy, Y., Shpak, Z.: The Ateb-Gabor filter for fingerprinting. In: Shakhovska, N., Medykovskyy, M.O. (eds.) CSIT 2019. AISC, vol. 1080, pp. 247–255. Springer, Cham (2020). https://doi.org/10.1007/ 978-3-030-33695-0 18 19. Nazarkevych, M., Lotoshynska, N., Klyujnyk, I., Voznyi, Y., Forostyna, S., Maslanych, I.: Complexity evaluation of the Ateb-Gabor filtration algorithm in biometric security systems. In: 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON), vol. 26, pp. 961–964 (2019). https://doi. org/10.1109/UKRCON.2019.8879945 20. Niblack, W.: An Introduction to Digital Image Processing, vol. 26, p. 215. Strandberg Publishing Company (1985) 21. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979). https://doi.org/10.1109/TSMC.1979.4310076 22. Pun, W., Linxui, X., Zilin, H.: Imputation method used in missing traffic. Artif. Intell. Algorithms Appl. 11, 662–675 (2019). https://doi.org/10.1007/978-981-155577-0 53 23. Russ, J.: The Image Processing Handbook, p. 832 (2006). https://doi.org/10.1201/ 9780203881095 24. Saha, J., Mukherjee, J.: CNAK: cluster number assisted k-means. Pattern Recogn. 110, 11–15 (2021). Article number 107625. https://doi.org/10.1016/j.patcog.2020. 107625 25. Sauvola, J., Pietikainen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), 225–236 (2000). https://doi.org/10.1016/S0031-3203(99)00055-2 26. Sezgin, M., Sankur, B.: Survey over image thresholding techniques and quantitative performance evaluation. J. Electr. Imaging 13(1), 146–165 (2004). https://doi.org/ 10.1117/1.1631315 27. Trier, O.D., Jain, A.K.: Goal-directed evaluation of binarization methods. IEEE Trans. Pattern Anal. Mach. Intel. 26, 1191–1201 (1995). https://doi.org/10.1109/ 34.476511 28. Vala, H., Baxi, A.: A review on Otsu image segmentation algorithm. Int. J. Adv. Res. Comput. Eng. Technol. (IJARCET) 2(2), 387–389 (2013) 29. Zack, G., Rogers, W., Latt, S.: Automatic measurement of sister chromatid exchange frequency. J. Histochem. Cytochem. 25(7), 741–753 (1977). https://doi. org/10.1177/25.7.70454 30. Zhang, Y., He, Z.: Agnostic attribute segmentation of dynamic scenes with limited spatio-temporal resolution. Pattern Recogn. 91(1), 261–271 (2019). https://doi. org/10.1016/j.patcog.2019.02.026
Computational Intelligence and Inductive Modeling
Spectrophotometric Method for Coagulant Determining in a Stream Based on an Artificial Neural Network Andrii Safonyk(B) , Maksym Mishchanchuk , and Ivanna Hrytsiuk National University of Water and Environmental Engineering, Rivne, Ukraine {a.p.safonyk,mishchanchuk ak17,i.m.hrytsiuk}@nuwm.edu.ua Abstract. Spectrophotometric analysis based on artificial neural network (ANN), partial least squares regression (PLS) and basic component regression (PCR) models has been proposed to simultaneously determine coagulant (Fe) during the electrocoagulation process. An experimental laboratory installation was created to study the processes of photometric research with a device that analyzes the color and intensity of light in real time. The color sensor determines the color parameters: RGB, which, based on ANN, are translated into the HSL color space. Software for determining the concentration of iron in the coagulant using artificial intelligence, which is a web application for displaying the color parameters of the coagulant, the determined concentration of iron in the coagulant, as well as saving the history of all measurements in the database. ANN, trained using different teaching methods. An optimizer was selected for the appropriate process, the standard deviation (RMSE) is 6.91%. Keywords: Artificial neural network · RGB · HSL · TensorFlow · Coagulant · Intelligent information system · Photocolorimeter method
1
Introduction
Clean water is important in the lives of people, animals, plants and other living organisms. Water is the most important component of all living things, and its pollution is one of the global environmental problems of humanity. Water potential is the basis of social, environmental and economic development. Cities development provides to new problems. With the development of industry around the world, there is pollution of the year, lakes and other aquatic pests, as well as many reservoirs of general transformation in the walls of canals. Increasing the degree of purification of polluted water and improvement of treatment facilities is an important measure to prevent pollution of our ecosystem. One of the methods of wastewater treatment is the electrocoagulation method. Electrocoagulation is an electrochemical method of wastewater treatment. The essence of the process of electrocoagulation for water purification and treatment is the sedimentation of colloidal systems due to the constant influence of electric current. During the c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 589–601, 2022. https://doi.org/10.1007/978-3-030-82014-5_40
590
A. Safonyk et al.
procedure, present in the purified liquid, technical salts are subjected to electrical dissociation, and ions selectively interact with harmful impurities and precipitate in the form of conventional gels. Electrocoagulation methods have several advantages, namely simple equipment, easy operation, short reaction time. One of the main parameters during this process is to determine the concentration of iron in the coagulant, which will be used for wastewater treatment. Determining the coagulant content is a complex, time-consuming and expensive process, usually such studies require laboratory equipment. One of the methods for determining the concentration of the useful substance (coagulant) is the photocolometric method, which is actually used in laboratory studies. This method is to determine the intensity of color of the test substance due to the degree of absorption of polychromatic light in the visible region of the spectrum. However, the process itself is quite cumbersome and takes place in “manual” mode, which makes adjustments for its use. As such, the corresponding method is not applicable to the determination of coagulant in real time, and therefore it is virtually impossible to create automated control systems for the process of treatment of process water by electrocoagulation method. Recently, a new technique called digital colorimetry based on images of measured red, green and blue (RGB) values will use different color intensities instead of spectrophotometry. The value assigned to the user ranges from 0 to 255 for each color representation of more than 16 million different colors. This technique can be interpreted as colorimetry by reflecting light. The light that reaches each pixel in the image sensor is light that is reflected from objects and passes through three different filters (RGB filters), then they are read by color analysis software. Thus, this technique is very suitable for colorimetric reaction. In addition, new techniques that can improve efficiency are artificial neural networks. ANN is a computerized analogue of the biological nervous system, and they are an important class of pattern recognizers that are useful for a wide variety of applications. The architecture of the ANN model must evolve, interact between input and output data through learning on data sets. At present, there are virtually no devices for determining the concentration of iron in the coagulant in real-time, so the actual task is to develop an experimental automated system for electrochemical production of coagulant based on photocolorimetric analysis taking into account various perturbations.
2
Problem Statement
Given that the change in iron concentration in water during the electrocoagulation process changes the color and color intensity of the solution, the aim of this research is to develop an automated information system for real-time color and light intensity analysis taking into account perturbations such as conductivity, pH and temperature, environment.
Spectrophotometric Method for Coagulant Determining in a Stream
3
591
Literature Review
Studying the works of authors working on related topics, a number of issues were found that have already been researched. Thus, in [14] the author analyzes water samples by the colorimetric method using a mobile phone. The screen of the phone serves as a light source, and the camera in front serves as a detector. The reflected saturations of white, red, green, and blue were used to analyze the major components to classify several compounds and their concentrations in water. Article [3] describes the development of a portable, cost-effective device for determining concentration, controlled by an information system based on the iPhone. The research [7] presents the design of Lab-Chip for colorimetric detection of impurities in water samples. Pressure, flow and concentration distribution on the microchannel system are modeled in COMSOL. In [10] a simple, sensitive, specific and tested colorimetric method for quantitative evaluation of a component in extracted water was developed. The presented article [9] includes the construction of a simple, inexpensive single-wave spectrophotometer using LEDs as a light source and a detector that detects and measures the content of heavy metals in water. In [12] a device was made - a digital colorimeter that determines the iron content with an implemented software application that registers the components of the color scheme, preserving the values of red, green and blue, and calculates the values of hue, brightness and transparency, using standard theory colors, and in [6] developed a budget portable device for colorimetric analysis in place. The authors’ research [4] is conducted using a webcam that determines the iron content in water using digital image colorimetry. The main method of this study is the regression of least squares to obtain a quadratic equation based on red, green and blue hues, color saturation and brightness, which allowed to evaluate the most accurate analysis. Various methods and approaches for spectrophotometric determination of iron in substances and components have been developed in [2,5,8,11,13]. Given the above and the fact that the change in iron concentration in water during the electrocoagulation process changes the color and color intensity of the solution, the urgent task is to supplement the experimental laboratory installation for photometric research devices that will analyze the color and light intensity in real mode At that time, the aim of the work was to develop a hardware-software complex for determining the concentration of iron in a liquid by its chromaticity.
4
Materials and Methods
To develop an automated information technology for analysis of color and light intensity, a 3D model and block diagram of a laboratory installation for the study of electrocoagulation processes with an additional block of photocolometric analysis of rubber, which is shown in Fig. 1 and Fig. 2. Consider the developed laboratory installation (Fig. 1), consisting of a laboratory power supply (1), electrocoagulator (2), pumps (3) and (4), measuring cell (5) and control and registration unit (6).
592
A. Safonyk et al.
Fig. 1. Illustration of laboratory installation of automated system of electrochemical production of coagulant based on photocolorimetric analysis
Fig. 2. Structure of laboratory installation of automated system of electrochemical production of coagulant on the basis of photocolorimetric analysis
According to the developed structure (Fig. 2) of the laboratory installation, the electrocoagulator (3) is of flow type and is a container made of nonconductive material with metal plates inside, alternately of different polarity, forming cathodes and anodes. The electrocoagulator is powered by a laboratory power supply (1). Water is supplied by the pump (2), evenly distributed between the coagulator chambers and is discharged by the pump (4) to the measuring cell (7). The temperature of the test medium is monitored using an electronic thermometer (5), which is duplicated with laboratory mercury. The measuring cell consists of a flowing transparent measuring container (8), a light source (6) and a sensing element (9). The measuring information as well as the control signals are switched in the control and registration unit (10).
Spectrophotometric Method for Coagulant Determining in a Stream
593
The device works as follows. Iron-containing coagulant at a constant flow is pumped through the cell (1), ensuring the continuity of the process of obtaining a sample for research. Light from a white light source (2) passes through a substance in a cell and enters a light-sensitive sensor (3), which decomposes this light into spectra of three colors (red, green, blue) and forms a digital signal that characterizes the color of the liquid in the cell. This digital signal is fed to the microprocessor data processing unit (4), where it is processed and analyzed. To determine the color of the substance, a color sensor based on the TCS230 chip was added to the system (Fig. 3). This sensor has three spectra for determining the color of the product: red, blue and green. Color determination is performed by changing the signal level from high to low and vice versa. Switching is controlled by GPIO pins of the Raspberry Pi microcomputer. After switching the filter, the PWM sensor signal in which the output frequency of light is encoded is read from the foam. The frequency readings from the color sensor are converted to the values of the red, green and blue RGB color space. Then these values are converted into HSL color space and the concentration of iron in the coagulant is determined using a neural network. In Fig. 3 shows an extended measuring system for more accurate determination of coagulant. In addition to the color sensor, 3 analog sensors were connected to the Raspberry Pi: temperature, pH and conductivity sensor. The pH value is determined by: pH = aV + b, where pH is the desired pH value, V is the value of the voltage obtained from the sensor, and a and b are the values obtained during the calibration of the pH sensor. Conductivity is determined by the following algorithm: first read the voltage value from the sensor, then there is a “raw” value of conductivity according to the following dependence: rawEc =
1000· V 820.0
200.0
.
. The value obtained is multiplied by kValue, which is initially equal to 1, and with each subsequent measurement changes, depending on the condition rawEc > 2, respectively, if the condition is true, then choose kValueHigh, if false, then kValueLow, kValueHigh and kValueLow are determined during calibration.
594
A. Safonyk et al.
Fig. 3. Dummy board of the extended system of definition of coagulant
By selecting a new value for kValue and taking into account the ambient temperature, the conductivity value is determined by the following dependency: ec =
rawEc · kV alue , 1 + 0.0185(t − 25)
where ec is the desired value of rawEc and kValue described above, t is the temperature value obtained from the temperature sensor. To find two values for kValue, a calibration similar in process to the calibration of the pH sensor was performed. The sensor was introduced into the solution with different conductivity and depending on: kV alue = 820.0 · 200.0 · (ec · (1.0 + 0.0185 · (t − 25)))/1000/V, where kValue is the desired value, ec is the conductivity value in the reference solution, t is the temperature value, and V is the voltage value. Since the Raspberry Pi has no analog ports, an ADC (ads1115) was used to connect them. This ADC communicates with Raspberry Pi using the I 2 C interface. This ADC has four channels for connecting analog sensors. Accordingly, the I 2 C interface provides voltage values at the analog ADC inputs. Voltage values are converted into the values of their measured values using dependencies, as parameters for which there is a voltage value and constants characteristic of each of the sensors, the values for which were obtained during the calibration process.
Spectrophotometric Method for Coagulant Determining in a Stream
595
The developed laboratory installation allowed to carry out experimental researches, and also to receive additional data on change of color of water from concentration of iron.
5
Experiment, Results and Discussion
According to the experimental plan, the study lasted 60 min, and every 3 min was recorded indicators of the power supply current of the coagulator, temperature and intensity of the color component of red, green and blue light. Thus, 20 samples of a solution of iron in water were taken for laboratory determination of concentration by standardized methods. Table 1 shows the results of an experimental study of the process, namely: determination of the concentration of total and trivalent iron, current and color of the substance at different times and with changing voltage. Table 1. The results of the experimental research Sample # Time, min Voltage, V Current strength, A
Chromaticity Temperature, ◦ C Concentration, mg/dm3
1
3
5
0,8
204 207 200 20
0,8
2
6
5
0,8
209 211 196 20
1
0,38
3
9
5
0,85
213 215 193 20
1,1
0,4
4
12
5
0,85
212 205 137 21
2,6
1,6
5
15
6
1,3
211 194
76 21
3,7
2,9
6
18
6
1,3
215 183
39 22
4,1
3
7
21
6
1,3
220 172
0 22
4,5
3,1
8
24
6
1,3
210 161
0 22
5,4
4,3
9
27
7
1,6
210 155
0 22
6,3
5,4
10
30
7
1,65
213 151
0 22
6,7
5,5
11
33
7
1,65
215 147
0 22
6,9
5,8
12
36
7
1,65
204 132
0 23
8,2
6,4
13
39
8
1,85
193 116
0 23
9,1
7,1
14
42
8
1,9
191 113
0 23
9,4
7,4
15
45
8
1,9
189 111
0 23
9,6
7,6
16
48
8
1,95
189 105
0 23
10,7
8,1
17
51
9
2,05
188 100
0 23
11,8
8,9
18
54
9
2,05
183
87
0 23
12,2
9,2
19
57
9
2,1
178
74
1 23
12,6
9,6
20
60
9
2,1
168
81
3 23
13.4
10,3
R
G
Total iron F e3+
B
0,33
596
A. Safonyk et al.
Fig. 4. Substance color change at change of voltage
The experimental research was conducted for 60 min, from the results of the experiment we see that with increasing voltage from 5 V to 9 V - increases the current from 0.8 A to 2.1 A according to the results of Table 1 and changes the color of the starting material according to Fig. 4. That is, at the lowest voltage we can observe the lightest and almost transparent water, and at the highest voltage turbid and red water. During the research, 20 samples were taken, which are shown in Fig. 4 and on the basis of which the determination of total iron and three valence iron F e3+ . The results of the experiment made it possible to visualize the change in the RGB spectrum in each of the 20 experiments. The obtained results of color and concentration dependence were translated into the HSL spectrum on the basis of which a neural network was developed to determine the concentration of the useful substance in water. A neural network has been developed to determine the concentration, which makes it possible to combine 3 color parameters into one dependence, as well as to take into account other parameters for determining the concentration. RGB values were used as input for network learning (Fig. 5). This network has three input nodes responsible for red, green and blue, two hidden layers of 9 nodes and one output node - the hue value of the HSL color space. The number of nodes - 9 pieces, in the hidden layer, due to the quality of the original value. It is experimentally determined that this number of nodes satisfies the initial value, which is close to the reference. After the learning process, the network aims to predict unknown data. Using the value of hue captured by the previous neural network, a neural network was developed to determine the concentration of iron. As the input
Spectrophotometric Method for Coagulant Determining in a Stream
597
Fig. 5. General structure of ANN with RGB inputs, HSL output and 2 hidden layers
parameter for which was hue, and the result of the neural network is the concentration of iron in the coagulant. The developed neural network has a multilayer perceptron architecture, which has 4 layers (Fig. 6). The first layer is the input layer to which the value of hue is applied, the next two are hidden and they perform calculations, which are transmitted to the fourth output layer. Using experimental data, the neural network was trained by several optimizers [1]. Errors of neural networks trained by different optimizers can be seen in Table 2. Table 2. Research of optimizers Optimizer
SGD RMSprop Adam Adadelta Adamax Ftrl
Mistake (%) 6.91
8.28
9.33
373.22
9.05
370.43
As can be seen from Table 2, the best results showed a neural network trained by the SGD optimizer. The obtained neural network showed a standard deviation of 6.91% on the test dataset which was based on data obtained from the experiment. The process of learning the network can be seen in Fig. 7. Since this network showed the best results it was chosen to develop an application which determines the substance concentration. The whole process obtaining the iron concentration by color is as follows (see Fig. 8), first interviews the color sensor from which the RGB color indicators
598
A. Safonyk et al.
Fig. 6. Scheme of the developed neural network
Fig. 7. The mistake value during training iterations
Fig. 8. Scheme of process obtaining the coagulant concentration by RGB value
Spectrophotometric Method for Coagulant Determining in a Stream
599
are obtained. The next step is to convert RGB to the hue value of the HSL color scheme using a neural network. Using the neural network to determine the iron concentration on the parameter hue is the useful element concentration calculation. The last step is to save the received and calculated data. To easily access the measurement results of the color sensor and the iron concentration, which was determined using the developed neural network, a web application was developed as a server for which a Raspberry Pi 4 microcomputer was used. By connecting this microcomputer to a network at the device address, you can use it as a control panel for this device. The web application has the ability to enable and disable the process of measuring coagulant values. After starting the system, it is possible to monitor the color and concentration of iron in the coagulant. The panel displays the last measured color values and the determined iron concentration. The last ten defined parameters are also displayed on the graphs (Fig. 9). The values on the panel are displayed in real time (Fig. 10).
Fig. 9. Interface of program
600
A. Safonyk et al.
Fig. 10. History of measurements
6
Conclusions
The paper develops an automated information system for analysis of color and light intensity in real time, taking into account perturbations such as conductivity, pH and ambient temperature. An experimental laboratory installation was created to study the processes of photometric research with a device that analyzes the color and intensity of light in real time. The color sensor determines the color parameters: R, G, B, which, based on an artificial neural network, are translated into the color space HSL. Software for determining the concentration of iron in the coagulant using artificial intelligence, which is a web application for displaying the color parameters of the coagulant, the determined concentration of iron in the coagulant, as well as saving the history of all measurements in the database. To solve this problem, software was developed based on the Raspberry Pi 4 microcomputer and the connected TCS230 color sensor. Developed automated information system for determining the concentration of iron in iron-containing coagulant on the basis of photocolorimetric method allows to reduce human participation in the measurement process to a minimum, as well as to ensure the continuity of measurement of coagulant concentration.
References 1. Safonyk, A., Mishchanchuk, M., Lytvynenko, V.: Intelligent information system for the determination of iron in coagulants based on a neural network. Intell. Inf. Technol. Syst. Inf. Secur. 2021 2853, 142–150 (2021) 2. Alberti, G., Emma, G., Colleoni, R., Nurchi, V.M., Pesavento, M., Biesuz, R.: Simple solid-phase spectrophotometric method for free iron (iii) determination. Arab. J. Chem. 12(4), 573–579 (2019)
Spectrophotometric Method for Coagulant Determining in a Stream
601
3. Heidari-Bafroui, H., Ribeiro, B., Charbaji, A., Anagnostopoulos, C., Faghri, M.: Portable infrared lightbox for improving the detection limits of paper-based phosphate devices. Measurement 173 (2021). https://doi.org/10.1016/j.measurement. 2020.108607 4. Barros, J.A., Oliveira, F.M.D., Santos, G.D.O., Wisniewski, C., Luccas, P.O.: Digital image analysis for the colorimetric determination of aluminum, total iron, nitrite and soluble phosphorus in waters. Anal. Lett. 50(2), 414–430 (2016) 5. Zarei, K., Atabati, M., Malekshabani, Z.: Simultaneous spectrophotometric determination of iron, nickel and cobalt in micellar media by using direct orthogonal signal correction-partial least squares method. Analytica Chimica Acta 556(1), 247–254 (2006) 6. Firdaus, M.L., Alwi, W., Trinoveldi, F., Rahayu, I., Rahmidar, L., Warsito, K.: Determination of chromium and iron using digital image-based colorimetry. Procedia Environ. Sci. 20, 298–304 (2014) 7. Suliman, M.S., Yasin, S., Ali, M.S.: Development of colorimetric analysis for determination the concentration of oil in produce water. Int. J. Eng. Inf. Syst. 1(5), 9–13 (2017) 8. Ni, Y., Huang, C., Kokot, S.: Simultaneous determination of iron and aluminium by differential kinetic spectrophotometric method and chemometrics. Analytica Chimica Acta 599(2), 209–218 (2007) 9. Masawat, P., Harfield, A., Srihirun, N., Namwong, A.: Green determination of total iron in water by digital image colorimetry. Anal. Lett. 50(1), 173–185 (2016) 10. Place, B.: Activity analysis of iron in water using a simple led spectrophotometer. J. Chem. Educ. 29(6), 677–680 (2013) 11. e Silva, A.F.D.O., de Castro, W.V., de Andrade, F.P.: Development of spectrophotometric method for iron determination in fortified wheat and maize flours. Food Chem. 242, 205–210 (2018) 12. Sreenivasareddy, A.: Determination of iron content in water. Governors State University OPUS Open Portal to University Scholarship (2017) 13. Ribas, T.C., Mesquita, R.B., Moniz, T., Rangel, M., Rangel, A.O.: Greener and wide applicability range flow-based spectrophotometric method for iron determination in fresh and marine water. Talanta 216 (2020) 14. Iqbal, Z., Bjorklund, R.B.: Colorimetric analysis of water and sand samples performed on a mobile phone. Talanta 84(4), 24–39 (2011). https://doi.org/10.1016/ j.talanta.2011.03.016
Comparative Analysis of Normalizing Techniques Based on the Use of Classification Quality Criteria Oleksandr Mishkov1(B) , Kostiantyn Zorin1 , Denys Kovtoniuk1 , Vladyslav Dereko1 , and Igor Morgun2 1
Military Diplomatic Academy named after Yevheniy Bereznyak, Kyiv, Ukraine {alex 1369,k.zorin,anna kovtonyuk,vladyslavdereko}@ukr.net 2 National Academy of Security Service of Ukraine, Kyiv, Ukraine [email protected]
Abstract. The paper presents a comparative analysis of various types of normalization techniques. The accuracy of data classification which was carried out after data normalizing was used as the main criterion for evaluating the quality of the appropriate normalizing method. Four various types of datasets downloaded from the UCI Machine Learning Repository were used as the experimental data during the simulation process. Various normalization techniques available from package clusterSim of R software were applied to the experimental data. The quality of the data normalizing procedure was evaluated based on the use of data classification by the calculation of the accuracy of the objects distribution into classes. The neural network multilayer perceptron was used as the classifier at this step. The simulation results have shown that the data normalizing stage significantly influences the classification accuracy and selection of the normalization method depends on the type of data and, consequently, the selection of the normalizing technique should be carried out in each of the cases separately. Keywords: Data mining · Normalizing techniques · Classification quality criteria · Data pre-processing · Multilayer perseptron · Classification accuracy
1
Introduction
The current state of data science can be characterized by the sharp increasing amount of data needed to be processed to extract useful information from them on the one hand and the creation of effective systems of forecasting, classification, clustering, decision making, etc. on the other hand. In most cases, the experimental data are “raw”, incomplete, have various ranges of attribute value variation, and, for this reason, they need to be processed at the first step of solving the appropriate problem. In most cases, the preprocessing procedure c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 602–612, 2022. https://doi.org/10.1007/978-3-030-82014-5_41
Comparative Analysis of Normalizing Techniques
603
assumes missing values handling and data normalizing based on previously performed statistical analysis (exploring analysis). There are various techniques of these steps implementation nowadays. So, multiple missing values handling can be performed using various types of regression models, and, the accuracy and effectiveness of this procedure implementation depend on the type of the used regression dependence. In [11,14,15,19], the authors present the results of the research concerning solve of this problem. The main difficulties in these cases consist of a selection of a suitable regression model considering the investigated dataset. The next very important step of data preprocessing is data normalization or transforming the values of the attributes into the same range with the same norm (ideally a single). Successfully performed this step allows us to compare the investigated objects considering a set of all attributes where each of the attributes has the same weight. There are many normalizing techniques nowadays, however, there is not universal the best normalizing technique suitable for all types of data. Moreover, the selection of the normalizing method considering the type of the investigated data influences significantly the effectiveness of the following data processing step implementation. Of course, the objective selection of appropriate normalizing methods can be done only based on the quantitative quality criterion taking into account the goal of the data mining or machine learning procedure implementation. Herein before presented an analysis of the current problem indicates the importance and actuality of the data normalizing step considering the type of the used dataset. In this manuscript, we present the result of the research about an objective selection of the normalizing method taking into account the type of the studied data where the data classification quality criterion (accuracy) is used as the main criterion to evaluate the data normalizing quality.
2
Problem Statement
Figure 1 shows the step-wise procedure to choose the optimal normalizing method taking into account the type of the investigated data. As it can be seen, the implementation of this procedure assumes the following stages: – formation of a set of datasets which should be normalized during the simulation process implementation; – formation of a set of available normalizing methods; – choice and setup the data classification technique; – formation of the classification quality criteria; – data normalization and classification of the normalized data; – calculation of the classification quality criteria; – results analysis, choice of appropriate normalizing method which corresponds to the maximum value of the used data classification accuracy. These tasks are solved within the framework of our research.
604
O. Mishkov et al.
Fig. 1. Step-wise procedure of the appropriate normalizing technique selection
3
Literature Review
A lot of works are devoted to solving the problem of data of various nature preprocessing. So, in [22] the authors considered the problem of data exploring and feature selection for increasing the machine learning techniques accuracy. The step of data preprocessing, in this case, included noise removing and data normalizing. Then, three techniques of feature selection were applied to the normalized data in order to form the optimal dataset: Correlation Matrix with Heatmap, Feature Importance, and Recursive Feature Elimination with CrossValidation. At the next step, the classification procedure was carried out using KNearest Neighbor (KNN), Random Forest (RF) and Support Vector Machine (SVM) classifiers. In [21] the authors introduced the Ordered Quantile normalization technique which allows a one-to-one transforming that is designed to consistently and effectively transforming a vector of arbitrary distribution into a vector of a normal distribution. The authors compared the proposed technique with other well-known normalization methods and they have shown that the proposed technique operates consistently and effectively, regardless of the underlying distribution. The authors also explored the use of repeated cross-validation to identify the best normalizing transformation when the true underlying distribution is unknown. The research results concerning comparison analysis of various normalization techniques to normalize the RNA sequencing are considered in [9]. The authors simulated count data into two groups (treated vs. untreated) at sevenfold change levels using control samples from human HepaRG cells and normalized the data using seven normalization techniques. The Upper Quartile method performed the best with regard to maintaining fold change levels as detected by a limma contrast between treated vs. untreated groups. The technique of gene expression profiles extraction based on the complex use of clustering and classification techniques are presented in [8]. The paper [7] presents the results of the research about data processing based on the use of various types of density based
Comparative Analysis of Normalizing Techniques
605
clustering algorithms. However, these works did not consider the various normalization methods with comparison their advantages and shortcomings based on the use of various types of data. In [16–18], the authors have considered techniques of acoustic emission signals processing. However, their research was focused mainly to signal filtering and following extraction of informative attributes by using appropriate data transforming methods. The results of the research concerning the application of normalizing techniques to process medical datasets are considered in [10,20,23]. The particularities of this type of data are a high percent of uncertainties and significantly different ranges of the data attribute variation. However, in these papers, the authors have applied concrete normalization methods without their comparison for the objective choice of an optimal technique considering the type of the used data. Thus, taking into account presented hereinbefore, we would like to conclude that the problem of objective choice of the optimal normalizing technique considering the type of the investigated data has not unequivocal solution nowadays. In this manuscript, we present our version of this problem solution based on sequential enumeration of various available normalizing methods and the following selection of an optimal one in terms of the maximum value of the data classification accuracy. The goal of the research is the comparative analysis of various data normalizing techniques using different types of datasets in order to choose an optimal method in terms of the quantitative quality criterion of the investigated data classification.
4
Materials and Methods
The normalizing methods which are available in clusterSim package [2] of R software [13] were used within the framework of our research. Table 1 presents the list of the used methods. The neural network “multilayer perceptron” was used as a classifier during the simulation process. The optimal network parameters (structure and weight values) in terms of the global minimum of the errors surface were evaluated by cross-validation using the function train() of the caret package (R software) [1]. The optimal network structure and the optimal weight of the synapses corresponded to the maximum value of the classification accuracy: Acc =
TP + TN TP + TN + FP + FN
(1)
where T P, T N, F P and F N are true positive, true negative, false positive and false negative values predicted by the classifiers in comparison with appropriate cases in the tested dataset. In this case, 70 and 30% of the objects were used as train and test data respectively. Algorithm 1 presents the step-wise procedure which was used within the framework of our research.
606
O. Mishkov et al. Table 1. The list of the used normalizing methods
Index Type of normalization n1
Standardization: (x − mean)/sd
n2
Positional standardization: (x − median)/mad
n3
Unitization: (x − mean)/range
n4
Unitization with zero minimum: (x − min)/range
n5
Normalization in range < −1, 1 >: (x − mean)/max(abs(x − mean))
n5a
Positional normalization in range < −1, 1 >: (x − median)/max(abs(x − median))
n6
Quotient transformation: x/sd
n6a
Quotient transformation: x/sd
n7
Quotient transformation: x/range
n8
Quotient transformation: x/max
n9
Quotient transformation: x/mean
n10
Quotient transformation: x/sum
n11
Quotient transformation: x/sqrt(SSQ)
n12
Normalization: (x − mean)/sqrt(sum((x − mean)2 ))
n12a
Positional normalization: (x − median)/sqrt(sum((x − median)2 ))
n13
Normalization with zero being the central point (x − midrange)/(range/2)
Algorithm 1: Step-wise procedure of the optimal normalizing method selection based on the use of data classification technique Initialization: set: list of the available normalizing methods; the ranges of the neural network parameters variation (decay - weights weakening parameter and size - the number of neurons in the hidden layer); iteration counter c = 1, cmax = number of the normalizing methods; create the empty dataframe res; while c ≤ cmax do choose the normalizing method; data normalization; division of the dataset into train (70%) and test (30%) subsets randomly; determination of the optimal network structure and parameters using train dataset; application of the learned network to the test data. Calculation of the accurace value; c = c + 1; end Return the dataframe res of accuracy values for appropriate normalizing technique.
Comparative Analysis of Normalizing Techniques
5 5.1
607
Experiment, Results and Discussion Experimental Datasets
Four different types of datasets from [5] were used as the experimental data during the simulation procedure: – Iris Plants Database [12]. Contains 150 objects (50 in each of three classes: Setosa, Versicolour, Virginica) and four numeric attributes (sepal length in cm, sepal width in cm, petal length in cm and petal width in cm). – Seeds dataset [4]. The examined group comprised kernels belonging to three different varieties of wheat: Kama, Rosa and Canadian, 70 elements each, randomly selected as the result of the experiment performing. Each of the investigated objects contains seven attributes: area, perimeter, compactness, length of kernel, width of kernel, asymmetry coefficient, length of kernel groove. – Wine recognition data [6]. Contains three kinds of wine (three of classes). Each of the investigated objects is characterized by 13 attributes: alcohol, malic acid, ash, alkalinity of ash, magnesium, total phenols, flavonoids, nonflavonoid phenols, proanthocyanins, color intensity, hue, OD280/OD315 of diluted wines, proline. – Glass Identification Database [3]. Contains seven types of glass (7 classes) and 7 of the attributes: refractive index, sodium, magnesium, aluminum, silicon, potassium, calcium. Figure 2 shows the character of the objects distribution in the classes for Iris and Seeds datasets. We used two principal components in this case. As it can be seen, the objects can be divided into clusters adequately.
Fig. 2. The character of the objects distribution in the classes for Iris and Seeds datasets
608
O. Mishkov et al.
Figure 3 shows the same results for Wine and Glass datasets using parallel coordinate plots (in these cases two principal components contained low values of explained variance). As it can be seen, the Wine and Glass datasets are more complex and the division of the objects into classes in these cases is a more difficult task.
Fig. 3. The character of the objects distribution in the classes for Wine and Glass datasets
5.2
Simulation and Discussion
Figure 4 shows the character of the investigated data distribution using boxplot diagrams. As it can be seen from Fig. 4, the ranges of appropriate attribute variation for all datasets are very different. This fact complicates the qualitative data classification or other machine learning procedure. Figure 5 represents the results of Algorithm 1 step-by-step procedure implementation. In this case, n0 method corresponds to the non-normalized data. The analysis of the obtained results allows concluding that normalization methods that correspond to maximum value of the classification accuracy are
Comparative Analysis of Normalizing Techniques
609
Fig. 4. Boxplots of non-normalized datasets
Fig. 5. Results of the simulation concerning selection of the optimal normalizing method based on the classification accuracy criterion
610
O. Mishkov et al.
different for various datasets. So, the normalization methods n1, n6a, n8, n9 and n11 are the optimal ones for iris dataset. In this case, we have obtained 100% classification accuracy for test dataset. The normalizing technique n11 is optimal for seeds data. The highest (almost maximal) classification accuracy was received in this case. The methods n3a and n5a are optimal ones for complex wine data. Data normalization using these methods corresponds to maximum value of the test data classification accuracy. In the case of the glass dataset use, the n1 and n5 normalization methods are optimal ones. However, the classification accuracy value in these cases are less in comparison with other datasets use. In this case, it is necessary to consider the attributes using current statistical analysis techniques. Figure 6 shows the box plots of the normalized data using the optimal normalization technique. An analysis of the obtained box plots confirms the
Fig. 6. Boxplots of normalized datasets
herein-before conclusions about high quality of data processing (normalizing) for iris, seeds and wine datasets since the values of all attributes are distributed within the same range adequately. In the case of glass dataset use, the data contains many outliers and this fact can influence the classification accuracy. However, to our mind, the proposed technique can allow us to select an optimal normalization method taking into account the investigated dataset.
6
Conclusions
In this manuscript, we have presented the results of the research concerning selection of an optimal data normalization technique in term of maximum value
Comparative Analysis of Normalizing Techniques
611
of the investigated data classification accuracy. Four different types of datasets were used as the experimental data during the simulation procedure: Iris Plants, Seeds, Wine and Glass. The created charts of the investigated data distribution have shown very different ranges of the data attributes variation. The various normalization techniques that are available from clusterSim package of R software were used within the framework of our research. The neural network “multilayer perceptron” has been used as a classifier during the simulation process. The optimal network parameters in terms of the global minimum of the errors surface were evaluated by cross-validation using the function train() of the caret package (R software). The results of the simulation have been shown that the normalization methods which correspond to the maximum value of the classification accuracy are different for various datasets. Moreover, for three datasets we have received the maximum (or almost the maximum) value of the classification accuracy of the test datasets. In the case of glass dataset use, we have received some less classification accuracy. This fact can be explained by the existence of many outliers in the data and it can influence the classification accuracy. However, in any case, the proposed technique can allow us to choose the optimal normalization technique objectively.
References 1. Caret package. https://topepo.github.io/caret/ 2. Clustersim package. http://keii.ue.wroc.pl/clusterSim/ 3. Glass identification database. https://archive.ics.uci.edu/ml/datasets/glass+ identification 4. Seeds dataset. https://archive.ics.uci.edu/ml/datasets/seeds 5. Uci - machine learning repository. https://archive.ics.uci.edu/ml/datasets.php 6. Wine recognition data. https://archive.ics.uci.edu/ml/datasets/wine 7. Babichev, S., Durnyak, B., Zhydetskyy, V., Pikh, I., Senkivskyy, V.: Application of optics density-based clustering algorithm using inductive methods of complex system analysis. In: IEEE 2019 14th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2019 - Proceedings, pp. 169–172 (2019). https://doi.org/10.1109/STC-CSIT.2019.8929869 ˇ 8. Babichev, S., Skvor, J.: Technique of gene expression profiles extraction based on the complex use of clustering and classification methods. Diagnostics 10(8), 584 (2020). https://doi.org/10.3390/diagnostics10080584 9. Bushel, P., Ferguson, S., Ramaiahgari, S., Paules, R., Auerbach, S.: Comparison of normalization methods for analysis of TempO-Seq targeted RNA sequencing data. Front. Genet. 11, 594 (2020). https://doi.org/10.3389/fgene.2020.00594 10. Carmona-Rodr´ıguez, L., Mart´ınez-Rey, D., Mira, E., Ma˜ nes, S.: SOD3 boosts T cell infiltration by normalizing the tumor endothelium and inducing laminin-a4. OncoImmunology 9(1), 1794163 (2020). https://doi.org/10.1080/2162402X.2020. 1794163 11. De Silva, A., De Livera, A., Lee, K., Moreno-Betancur, M., Simpson, J.: Multiple imputation methods for handling missing values in longitudinal studies with sampling weights: comparison of methods implemented in Stata. Biometrical J. 63(2), 354–371 (2021). https://doi.org/10.1002/bimj.201900360
612
O. Mishkov et al.
12. Fisher, R.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7(2), 179–188 (1936) 13. Ihaka, R., Gentleman, R.: R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5(3), 299–314 (1996) 14. Johnson, T., Isaac, N., Paviolo, A., Gonz´ alez-Su´ arez, M.: Handling missing values in trait data. Glob. Ecol. Biogeogr. 30(1), 51–62 (2021). https://doi.org/10.1111/ geb.13185 15. Kim, K.H., Kim, K.J.: Missing-data handling methods for lifelogs-based wellness index estimation: comparative analysis with panel data. JMIR Med. Inform. 8(12), e20597 (2020). https://doi.org/10.2196/20597 16. Marasanov, V., Sharko, A., Sharko, A., Stepanchikov, D.: Modeling of energy spectrum of acoustic-emission signals in dynamic deformation processes of medium with microstructure. In: 2019 IEEE 39th International Conference on Electronics and Nanotechnology, ELNANO 2019 - Proceedings, pp. 718–723 (2019). https://doi. org/10.1109/ELNANO.2019.8783809 17. Marasanov, V., Stepanchikov, D., Sharko, A., Sharko, A.: Technique of system operator determination based on acoustic emission method. Adv. Intell. Syst. Comput. 1246, 3–22 (2021). https://doi.org/10.1007/978-3-030-54215-3 1 18. Marasanov, V., Sharko, A., Sharko, A.: Energy spectrum of acoustic emission signals in coupled continuous media. J. Nano- Electron. Phys. 11(3), 03027 (2019). https://doi.org/10.21272/jnep.11(3).03028 19. Ngueilbaye, A., Wang, H., Mahamat, D., Junaidu, S.: Modulo 9 model-based learning for missing data imputation. Appl. Soft Comput. 103, 107167 (2021). https:// doi.org/10.1016/j.asoc.2021.107167 20. Northoff, G., Mushiake, H.: Why context matters? Divisive normalization and canonical microcircuits in psychiatric disorders. Neurosci. Res. 156, 130–140 (2020). https://doi.org/10.1016/j.neures.2019.10.002 21. Peterson, R., Cavanaugh, J.: Ordered quantile normalization: a semiparametric transformation built for the cross-validation era. J. Appl. Stat. 47(13–15), 2312– 2327 (2020). https://doi.org/10.1080/02664763.2019.1630372 22. Sharma, S., Sood, M.: Exploring feature selection technique in detecting sybil accounts in a social network. Adv. Intell. Syst. Comput. 1166, 695–708 (2020). https://doi.org/10.1007/978-981-15-5148-2 61 23. Turkheimer, F., Selvaggi, P., Mehta, M., et al.: Normalizing the abnormal: do antipsychotic drugs push the cortex into an unsustainable metabolic envelope? Schizophrenia Bull. 46(3), 484–495 (2020). https://doi.org/10.1093/schbul/sbz119
Robust Recurrent Credibilistic Modification of the Gustafson - Kessel Algorithm Yevgeniy Bodyanskiy1 , Alina Shafronenko1 , Iryna Klymova1 , and Vladyslav Polyvoda2(B) 1
Kharkiv National University of Radio Electronics, Kharkiv, Ukraine [email protected], [email protected], [email protected] 2 Kherson State Maritime Academy, Kherson, Ukraine
Abstract. The task of fuzzy clustering data is very interesting and important problem and often found in many applications related to data mining and exploratory data analysis. For solving these problems the traditional methods require that every vector-observations are fed from data could belong to only one cluster. A more natural is situation when a vector-observations with the various possibilities of membership levels can belong more, than one cluster. In this situation more effective are methods of fuzzy clustering that are synthesized for the allowance of the mutual overlapping of the classes, which are formed in the process of analyzing the data. Novadays, the most widespread algorithms of probabilistic fuzzy clustering. At the same time, this approach has the significant disadvantages associated with strict “probabilistic” constraints on the level of membership and increased sensitivity to abnormal observation, which are often present in the initial data sets. Therefore, as an alternative to probabilistic fuzzy clustering methods the recurrent modification credibilistic fuzzy clustering method, was proposed that’s based on credibility approach and Gustafson - Kessel algorithm for fuzzy clustering. Keywords: Computational and artificial intelligence · Fuzzy neural networks · Machine learning · Self-organizing neural network · Credibilistic fuzzy clustering · Modification of Gustafson - Kessel algorithm
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 613–623, 2022. https://doi.org/10.1007/978-3-030-82014-5_42
614
1
Y. Bodyanskiy et al.
Introduction
One of the main areas of computational intelligence is fuzzy clustering of big data. For solving the problem of clustering data, that are described by feature vectors, has been successfully used probabilistic and possibilistic fuzzy clustering methods, that are based on the level of membership and increased sensitivity to abnormal observation, which are often present in the initial data sets [6,11], but these methods suffer from the problem when clusters “merge” into a single class. Therefore, as an alternative to probabilistic and possibilistic approaches credibilistic approach [10], based on the credibility theory has been proposed [8]. In the process of fuzzy clustering not only centroids of every cluster and membership levels of vector-observation are computed, but the credibilistic level of the obtained results. It should be noted that most fuzzy clustering algorithms and all credibilistic procedures form clusters of a hyperspherical form. This leads to the fact that in the information processing, can be formed significantly more clusters than their actual number. This difficulty can be overcome by using fuzzy clustering methods that form classes whose shapes are not spherical. One of such procedures is the Gustafson-Kessel algorithm [3], that forms clusters of hyperellipsoidal form with arbitrary orientation of the axes in features space. This algorithm is more efficient than the classic fuzzy C-means, however, it is the same based on a probabilistic approach, i.e. unstable to abnormal outliers in initial data sets. In this case, it seems appropriate to introduce into consideration a procedure of the Gustafson-Kessel type, however, based on credibilistic assumptions, i.e. having robust properties. In addition, most of the popular fuzzy clustering algorithms are designed to process information in batch mode, when the entire array of data to be processed is set in advance. It is clear that this approach is ineffective for Big Data Mining and Data Stream Mining tasks when data are fed for processing sequentially, possibly in online mode. In such problems, recurrent fuzzy clustering procedures come to the fore, in which each observation that has been processed is not used in the future, i.e. are actually forgotten. In this regard, this article proposes a modified fuzzy clustering procedure based on the Gustafson-Kessel algorithm, i.e. actually on Mahalanobis distance, which allows the formation of clusters of a hyperellipsoidal form with arbitrary orientation of the axes, i.e. unlike probabilistic algorithms, which are robust to abnormal outliers in data, and allow to process information in online mode, which permits processing large amounts of data that are contained, for example, in the “cloud” without its storing.
2
Recurrent Method of Credibilistic Fuzzy Clustering
The problem of fuzzy clustering based on goal functions is proposed using credibilistic fuzzy clustering algorithms are associated with minimizing the goal function.
Robust Recurrent Credibilistic Modification of the G-K Algorithm
615
The main information for processing of fuzzy clustering in a batch mode or an array of n-dimensional vectors of observation X = {x1 , x2 , ..., xN } ⊂ Rn , where xτ ∈ X, τ = 1, 2, ..., N . If initial information fed sequentially in online mode, this data must be partition into m overlapping classes (1 < m < N ) with some level of membership μq (τ ) and standardized to the hypercube [−1, 1]n . The method of credibilistic fuzzy clustering associated with minimizing the objective function: E (Credq (τ ), cq ) =
m N
Credβq (τ ) ∗ D2 (x(τ ), cq )
(1)
τ =1 q=1
with conditions
⎧ ⎨ 0 ≤ Credq (τ ) ≤ 1∀q, τ, sup Credq (τ ) ≥ 0, 5∀τ, ⎩ Credq (τ ) + sup Credl (τ ) = 1
(2)
where Credq (τ ) is the credibility level of observation. The membership function of credibilistic fuzzy clustering has the form (3): μq (k) = φq (D (x(τ ), cq ))
(3)
where D (x(τ ), cq ) are the Euclidean distances between vectors-observation x(τ ) and centroid of q-th clusters; φq (D (x(τ ), cq )) are the decreases monotonically on the interval [0, ∞], φq (0) = 1, φq (∞) → 0. Solving the problem of credibilistic fuzzy clustering, we obtain the solution in a batch form: ⎧ 1 μq (τ ) = 1+D2 (x(τ ⎪ ),cq ) , ⎪ ⎪ ∗ ⎪ (τ ) = μ (τ ) ∗ (sup μl (τ ))−1 , μ ⎪ q ⎪ ∗ ⎨ q Credq (τ ) = μq (τ ) + 1 − sup μ∗l (τ ) ∗ 0.5, (4) N ⎪ Credβ ⎪ q (τ )x(τ ) ⎪ τ =1 ⎪ ⎪ c = , ⎪ N ⎩ q Credβ q (τ ) τ =1
and, finally, in online mode, we can rewrite equation (4) in the form (5): ⎧ −1 m
1 2 2 1−β ⎪ σ D (τ + 1) = (x(τ + 1), c (τ )) , ⎪ l q ⎪ ⎪ l=1 ⎪ ⎪ ⎪ ⎪ l=q ⎪ ⎪ ⎪ ⎪ ⎪ 1 ⎨ μq (τ + 1) = , (D2 (x(τ +1),cq (τ )))β−1 1+
σ 2 (τ +1)
q ⎪ ⎪ μ (τ +1) ⎪ μ∗q (τ + 1) = supqμl (τ +1) , ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ∗ ⎪ ⎪ 1 ⎪ sup μ∗l (τ + 1) , ⎪ ⎩ Credq (τ + 1) = 2 μq (τ + 1) + 1 − cq (τ + 1) = cq (τ ) + r(τ + 1)Credβq (τ + 1) (x(τ + 1) − cq (τ )) ,
(5)
616
Y. Bodyanskiy et al.
assuming the value fuzzifier as β = 2, arrive to solution, writing as (6): ⎧ ⎞−1 ⎛ ⎪ ⎪ ⎪ ⎪ ⎟ ⎜ ⎪ ⎪ m ⎜ ⎪ −2 ⎟ 2 ⎪ ⎟ , ⎜ ⎪ σ (τ + 1) =
x(τ + 1) − c (τ )
l ⎪ ⎟ ⎜ ⎪ q ⎪ l=1 ⎠ ⎝ ⎪ ⎪ ⎪ ⎪ l=q ⎨ −1
x(τ +1)−cq (τ )2 ⎪ μ (τ + 1) = 1 + , 2 ⎪ q σq (τ +1) ⎪ ⎪ ⎪ −1 ∗ ⎪ μq (τ + 1) = μq (τ + 1) ∗ (sup μl (τ + 1)) , ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ Credq (τ + 1) = μ∗q (τ + 1) + 1 − sup μ∗l (τ + 1) ∗ 0.5, ⎪ ⎪ ⎩ cq (τ + 1) = cq (τ ) + r(τ + 1)Cred2q (τ + 1) (x(τ + 1) − cq (τ )) .
3
(6)
Recurrent Credibilistic Modification of the Gustafson Kessel Algorithm (RCM GK)
In the process of fuzzy clustering using the considered algorithms, the classes that are formed have the shape of hyperspheres, which does not always correspond to the real conditions when these clusters can have an arbitrary shape. A more adequate and convenient are hyperelipsoidal form clusters with arbitrary orientation axes in the feature space. Such clusters can be formed using the Gustafson-Kessel algorithm [3] and its modifications [2,5,7], where distance has the form: 2
T
2 (x(τ ), cq ) = x(τ ) − cq Aq = (x(τ ) − wq ) Aq (x(τ ) − cq ) DA q
where
⎧ 1 ⎨ Aq = (det Sq ) n Sq−1 , N T ⎩ Sq = μβq (τ ) (x(τ ) − cq ) (x(τ ) − cq ) .
(7)
(8)
τ =1
Minimization (8) leads to a known result ⎧ 1 2 ⎪ DA (x(τ ),cq ) 1−β ⎪ q ⎪ ⎪ μq (k) = 1 , m ⎪ ⎪ 2 (x(τ ),c ) 1−β ⎨ DA l l ⎪ ⎪ ⎪ ⎪ cq = ⎪ ⎪ ⎩
N
l=1
μβ q (τ )x(τ )
τ =1 N
τ =1
μβ q (τ )
(9)
Robust Recurrent Credibilistic Modification of the G-K Algorithm
assuming the value fuzzifier as β = 2, arrive to solution, writing as: ⎧ 1 μq (τ ) = , ⎪ D 2 (x(τ ),cq ) ⎪ Aq ⎪ ⎪ 1+ 2 (τ ) ⎪ σ q ⎪ ⎛ ⎞−1 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎜ m ⎟ ⎪ ⎪ ⎜ −2 ⎟ ⎪ ⎨ 2 ⎟ , σq (τ ) = ⎜ D (x(τ ), c ) l ⎟ Al ⎜ l=1 ⎝ ⎠ ⎪ ⎪ ⎪ l=q ⎪ ⎪ ⎪ ⎪ N ⎪ ⎪ ⎪ μ2q (τ )x(τ ) ⎪ ⎪ k=1 ⎪ c = . ⎪ q N ⎪ ⎩ μ2q (τ )
617
(10)
τ =1
Thus, the relations (8)–(10) are essentially a procedure of probabilistic fuzzy clustering, but the classes that are formed have the form of hyperelipsoids with arbitrary orientation of the axes. In order to introduce a recurrent modification of the Gustafson-Kessel algorithm, we can use the matrix inversion formula of the Sherman-Morrison matrix [12] and the matrix determinant lemma [4], which leads to the online procedure: ⎧ 1 2 DA (x(τ +1),cq (τ )) 1−β ⎪ ⎪ q (τ ) ⎪ μq (τ + 1) = 1 , ⎪ m ⎪ 2 ⎪ DA (x(τ +1),cl (τ )) 1−β ⎪ q (τ ) ⎪ l=1 ⎪ ⎪ T ⎪ β ⎪ + 1) − cq (τ )) (x(τ + 1) − cq (τ )) , ⎪ Sq (τ + 1) = Sq (τ ) + μq (τβ+ 1) (x(τ ⎪ −1 T −1 ⎨ −1 μq (τ +1)Sq (τ )(x(τ +1)−cq (τ ))(x(τ +1)−cq (τ )) Sq (τ ) −1 c Sq (τ + 1) = Sq (τ ) − 1+μβq (τ +1)(x(τ +1)−cq (τ ))T Sq−1 (τ )(x(τ +1)−cq (τ )) , ⎪ ⎪ det Sq (τ + 1) =
⎪ ⎪ ⎪ ⎪ T β ⎪ = (det S (τ )) 1 + μ (τ + 1) (x(τ + 1) − c (τ )) (x(τ + 1) − c (τ )) , ⎪ q q q q ⎪ ⎪ ⎪ 1 −1 ⎪ ⎪ A (τ + 1) = (det Sq (τ + 1)) n Sq (τ + 1), ⎪ ⎩ q cq (τ + 1) = c(τ ) + r(τ + 1)μβq (τ + 1)Aq (τ + 1) (x(τ + 1) − cq (τ )) . (11) It is easy to modify the Gustafson-Kessel method in case of possibilistic fuzzy clustering. In this case, the objective function takes the form: E (μq (τ ), cq , ηq ) =
N m τ =1 q=1
2 μβq (τ )DA (x(τ ), cq ) + q
m q=1
ηq
N τ =1
β
(1 − μq (τ )) , (12)
618
Y. Bodyanskiy et al.
and in the batch form: ⎧ −1 1 2 ⎪ DA (x(τ ),cq ) β−1 ⎪ q ⎪ ⎪ μq (τ ) = 1 + , ⎪ ηq ⎪ ⎪ ⎪ ⎪ N ⎪ ⎪ ⎪ μβ q (τ )x(τ ) ⎪ ⎪ k=1 ⎪ c = , ⎪ q N ⎪ ⎪ μβ q (τ ) ⎪ ⎪ k=1 ⎨ N T Sq = μβq (τ ) (x(τ ) − cq ) (x(τ ) − cq ) , ⎪ ⎪ ⎪ τ =1 ⎪ ⎪ ⎪ ⎪ ⎪ 1 ⎪ −1 ⎪ ⎪ Aq = (det Sq ) n Sq , ⎪ ⎪ N β ⎪ ⎪ μq (τ )(x(τ )−cq )T Aq (x(τ )−cq ) ⎪ ⎪ τ =1 ⎪ η (τ ) = . ⎪ N ⎪ β ⎩ q τ =1
(13)
μq (τ )
The algorithm (13) we can rewritten in recurrent form: ⎧ 2 1 −1 ⎪ ⎪ DAq (τ )(x(τ +1),cq (τ )) β−1 ⎪ ⎪ μq (τ ) = 1 + , ⎪ ηq (τ ) ⎪ ⎪ ⎪ ⎪ ⎪ T ⎪ β ⎪ + 1) − cq (τ )) (x(τ + 1) − cq (τ )) , ⎪ Sq (τ + 1) = Sq (τ ) + μq (τβ+ 1) (x(τ ⎪ −1 T −1 ⎪ μ (τ +1)Sq (τ )(x(τ +1)−cq (τ ))(x(τ +1)−cq (τ )) Sq (τ ) ⎪ ⎪ Sq−1 (τ + 1) = Sq−1 (τ ) − q1+μβ (τ +1)(x(τ , ⎪ ⎪ +1)−cq (τ ))T Sq−1 (τ )(x(τ +1)−cq (τ )) q ⎪ ⎪ ⎨ det Sq (τ + 1) =
T β (det S (τ )) 1 + μ (τ + 1) (x(τ + 1) − c (τ )) (x(τ + 1) − c (τ )) , ⎪ q q q q ⎪ ⎪ ⎪ 1 ⎪ −1 ⎪ Aq (τ + 1) = (det Sq (τ + 1)) n Sq (τ + 1), ⎪ ⎪ ⎪ ⎪ cq (τ + 1) = cq (τ ) + r(τ + 1)μβq (τ + 1)Aq (τ + 1) (x(τ + 1) − cq (τ )) , ⎪ ⎪ ⎪ τ +1 ⎪ T ⎪ μβ q (p)(x(p)−cq (τ +1)) Aq (x+1)(x(p)−cq (τ +1)) ⎪ ⎪ p=1 ⎪ ⎪ η (τ + 1) = . q τ +1 ⎪ ⎪ ⎩ μβ q (p) p=1
(14) Despite some cumbersomeness of algorithm (14), its implementation is not much more complicated in comparison with the recurrent procedure of possibilistic fuzzy clustering. As for the credibilistic variant of the Gustafson-Kessel algorithm, its modification should be used instead of the objective function (1) in the form: E (Credq (τ ), cq ) =
m N τ =1 q=1
2 Credβq (τ )DA (x(τ ), cq ) q
(15)
Robust Recurrent Credibilistic Modification of the G-K Algorithm
with constraints (2)–(4), we can write ⎧ N ⎪ T ⎪ Sq = μβq (τ ) (x(τ ) − cq ) (x(τ ) − cq ) , ⎪ ⎪ ⎪ k=1 ⎪ ⎪ 1 ⎪ ⎪ Aq = (det Sq ) n Sq−1 , ⎪ ⎪ ⎪ 1 ⎪ μq (τ ) = ⎪ 2 (x(τ ),c ) , 1+DA ⎨ q q μq (τ ) ∗ ⎪ μq (τ ) = sup μl (τ ) , ⎪ 1 ⎪ ∗ ∗ ⎪ ⎪ Credq (τ ) = 2 μq (τ ) + 1 − sup μl (τ ) , ⎪ ⎪ N ⎪ ⎪ Credβ q (τ )x(τ ) ⎪ ⎪ τ =1 ⎪ c = . ⎪ q N ⎪ β ⎩ τ =1
619
(16)
Credq (τ )
It is easy to see that relation (16) is a generalization of algorithm (4) in the case of metric (7). Finally, we can introduce a recurrent credibilistic modification of the Gustafson-Kessel algorithm: ⎧
−1 2 ⎪ μ (τ + 1) = 1 + D (τ ) (x(τ + 1), c (τ )) , ⎪ q q Aq ⎪ ⎪ ⎪ μq (τ +1) ⎪ ∗ ⎪ ⎪ μq (τ + 1) = sup μl (τ +1) , ⎪ ⎪ ⎪ ⎪ Credq (τ + 1) = 12 μ∗q (τ + 1) − 1 − sup μ∗l (τ + 1) , ⎪ ⎪ ⎪ ⎪ Sq (τ + 1) = Sq (τ ) + μβq (τ + 1) (x(τ + 1) − cq (τ )) (x(τ + 1) − cq (τ ))T , ⎨ μβ (τ +1)S −1 (τ )(x(τ +1)−cq (τ ))(x(τ +1)−cq (τ ))T S −1 (τ )
q q Sq−1 (τ + 1) = Sq−1 (τ ) − q1+μβ (τ +1)(x(τ , ⎪ +1)−cq (τ ))T Sq−1 (τ )(x(τ +1)−cq (τ )) q ⎪ ⎪ ⎪ ⎪ ⎪ det Sq (τ + 1) =
⎪ ⎪ ⎪ ⎪ (det Sq (τ )) 1 + μβq (τ + 1) (x(τ + 1) − cq (τ ))T (x(τ + 1) − cq (τ )) , ⎪ ⎪ ⎪ 1 ⎪ −1 ⎪ n ⎪ ⎩ A(τ + 1) = (det Sq (τ + 1)) Sβq (τ + 1), cq (τ + 1) = cq (τ ) + r(τ + 1)μq (τ + 1)Aq (τ + 1) (x(τ + 1) − cq (τ )) . (17) It is easy to see that procedure (17) is a generalization of the procedure of confidence clustering (5) and recurrent modification of the Gustafson-Kessel algorithm (12).
4
Experiments
To estimate the effectiveness and efficiency of the proposed methods are tested in our experiments. Six states of modern clustering algorithms: Fuzzy c-means method (FCM) [1], Possibility c-means method (PCM) [6], Gustafson-Kessel algorithm (GK), Credibilistic clustering method (CCM) [9], Adaptive algorithm for probabilistic fuzzy clustering (APrFC), Adaptive algorithm for possibilistic fuzzy clustering (APosFC), Adaptive Credibilistic Fuzzy Clustering (ACrFC) [10] are employed in these experiments to compare with the proposed Recurrent Credibilistic Modification of the Gustafson - Kessel Algorithm (RCM GK). These algorithms have different advantages. This methods has low computational complexity.
620
Y. Bodyanskiy et al.
In the following experimental research conducted on samples of three data set: Abalone, Wine and Gas. Description of these data sets shown in Table 1. Table 1. Data set description: data set, data number, attributes number, cluster number Data set Data number Attributes number Cluster number Abalone 4177
8
3
Wine
178
13
3
Gas
296
2
6
The mean error of the clusters centroids of proposed RCM GK was compare with another well known methods, obtained result demonstrated in Table 1 and Table 2 (Table 3, Table 4, Table 5). Table 2. Comparison the mean error of the clusters centroids proposed RCM GK with the FCM, PCM, CCM and GK Data set Clustering algorithms FCM PCM CCM GK
RCM GK
Abalone 2.61
0.10
0.05 0.049
1.54
0.13
Gas
2.69
1.73
0.21
0.13
Wine
2.71
2.86
0.33
0.183 0.037
Table 3. Comparison the mean error of the clusters centroids proposed RCM GK with the APrFC, APosFC and ACrFC Data set Clustering algorithms APrFC APosFC ACrFC RCM GK Abalone 0.12
0.11
0.07
0.05
Gas
0.17
0.13
0.06
0.049
Wine
0.20
0.18
0.04
0.037
In Recurrent Credibilistic Modification of the Gustafson - Kessel Algorithm, according to the important assigning of fuzzy and credibilistic membership levels to the all samples, the impact is evident in obtained results of clustering and accurate determining of clusters centroids. To estimate the practicability these methods, we compared the running time of clustering on different data sets.
Robust Recurrent Credibilistic Modification of the G-K Algorithm
621
Table 4. Comparison of execution times (in seconds) of eight algorithms on tested data sets Data set Clustering algorithms FCM PCM CCM GK APrFC APosFC ACrFC RCM GK Abalone 1.62
0.28
0.25
0.27 0.12
0.11
0.17
0.15
Gas
0.43
0.28
0.25
0.22 0.17
0.18
0.16
0.15
Wine
0.22
0.21
0.22
0.24 0.13
0.18
0.14
0.16
Figure 1 presents the comparison of running time of these algorithms. As can be seen from the diagram (Fig. 1), the speed of work in solving the problem of the proposed method is higher than the known algorithms. The values of adaptive algorithms such as: Adaptive algorithm for probabilistic fuzzy clustering, Adaptive algorithm for possibilistic fuzzy clustering, Adaptive Credibilistic Fuzzy Clustering in some case is better, than Credibilistic Modification of the Gustafson - Kessel Algorithm due to the adaptive functions (Fig. 2).
Fig. 1. Comparison of running time of eight algorithms on tested data sets.
5
Conclusions
In the paper the task of credibilistic fuzzy clustering of a stream of observations is considered. The proposed modification of Gustafson-Kessel algorithm based
622
Y. Bodyanskiy et al.
Table 5. Comparison of number of iterations and execution times (in seconds) of eight algorithms on Abalone data set Clustering algorithms FCM PCM CCM GK APrFC APosFC ACrFC RCM GK Numbers of 40 iterations
99
75
100 45
78
55
76
Running time
4,51
3,43
1,22 1,41
1,58
1,49
1,15
1,63
Fig. 2. Comparison of number of iterations of eight algorithms on tested Abalone data set.
credibility approach to fuzzy clustering and permitted to form the overlapping classes of hyperelipsoidal shape with arbitrary orientation of axes in the space of features. The considered procedures are quite simple in numerical implementation and are designed to solve clustering problems in the framework of Data Stream Mining and Big Data Mining.
References 1. Bezdek, J.: A convergence theorem for the fuzzy ISODATA clustering algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 2(1), 1–8 (1980). https://doi.org/10.1109/ TPAMI.1980.4766964 2. Filho, M., Koki, L., Aguiar, R.: Pattern classification on complex system using modified Gustafson-Kessel algorithm. In: Proceedings of the 11th Conference on European Society for Fuzzy Logic and Technology (EUSFLAT), pp. 714–720 (2019) 3. Gustafson, E., Kessel, W.: Fuzzy clustering with a fuzzy covariance matrix. In: Proceedings of IEEE Conference on Decision and Control, pp. 761–766 (1979) 4. Harville, D.: Matrix Algebra from a Statistician’s Perspective. Springer, New York (1997). https://doi.org/10.4018/978-1-5225-6989-3
Robust Recurrent Credibilistic Modification of the G-K Algorithm
623
5. Krishnapuram, R., Jongwoo, K.: A note on the Gustafson-Kessel and adaptive fuzzy clustering algorithms. IEEE Trans. Fuzzy Syst. 4, 453–461 (1999) 6. Krishnapuram, R., Keller, J.: A possibilistic approach to clustering. IEEE Trans. Fuzzy Syst. 1, 98–110 (1993). https://doi.org/10.1109/91.227387 7. Lesot, M., Kruse, R.: Gustafson-kessel-like clustering algorithm based on typicality degrees. In: Uncertainty and Intelligent Information Systems, pp. 117–130 (2008) 8. Liu, B.: A survey of credibility theory. Fuzzy Optim. Decis. Making 4, 387–408 (2006). https://doi.org/10.1007/s10700-006-0016-x 9. Sampath, S., Kumar, R.: Fuzzy clustering using credibilistic critical values. Int. J. Comput. Intell. Inform. 3, 213–231 (2013) 10. Shafronenko, A., Bodyanskiy, Y., et al.: Online credibilistic fuzzy clustering of data using membership functions of special type. In: Proceedings of The Third International Workshop on Computer Modeling and Intelligent Systems (CMIS2020) (2020). http://ceur-ws.org/Vol-2608/paper56.pdf 11. Shafronenko, A., Bodyanskiy, Y., Rudenko, D.: Online neuro fuzzy clustering of data with omissions and outliers based on completion strategy. In: Proceedings of The Second International Workshop on Computer Modeling and Intelligent Systems (CMIS-2019), pp. 18–27 (2019) 12. Sherman, J., Morrison, W.: Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. Ann. Math. Stat. 21(1), 124–127 (1950)
Tunable Activation Functions for Deep Neural Networks Bohdan Bilonoh(B) , Yevgeniy Bodyanskiy , Bohdan Kolchygin , and Sergii Mashtalir Kharkiv National University of Radio Electronics, Kharkiv 61166, Ukraine {bohdan.bilonoh,yevgeniy.bodyanskiy,bohdan.kolchygin, sergii.mashtalir}@nure.ua
Abstract. The performance of artificial neural networks significantly depends on the choice of the nonlinear activation function of the neuron. Usually this choice comes down to an empirical one from a list of universal functions that have shown satisfactory results on most tasks. However this approach does not lead to optimal training in terms of model convergence over a certain number of epochs. We proposed tunable polynomial activation function for artificial neuron. Parameters of this function can be adjusted during learning procedure along with synaptic weights. The proposed function can take the form of universal ones due to its polynomial properties. Adjustable form tunable polynomial function leads to the fastest convergence of the model and more accurate training due to the possibility of using a smaller training step that has been shown experimentally. Improved convergence allows to apply tunable activation function to various deep learning problems. Keywords: Artificial neural network learning
1
· Activation function · Deep
Introduction
Artificial neural networks (ANN) are now widely used to solve data analysis problems due to their universal approximation properties and ability to learn. In general the task of training is to find the optimal synaptic weights in some sense. These properties allow to solve problems of computer vision, natural language processing and control of systems with nonlinear objects. Multilayer perceptron and its modifications are the most widespread the main unit of which is the elementary Rosenblatt’s perceptron with the so-called sigmoidal activation function. In addition to the classic σ-function [6] the activation functions T anh, Sof tSign, Saltin [4,15], special polynomial functions with fixed parameters [2] and other so-called squashing functions that satisfy the conditions of the Cybenko’s theorem [6] are also widely used. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 624–633, 2022. https://doi.org/10.1007/978-3-030-82014-5_43
Tunable Activation Functions for Deep Neural Networks
625
Deep neural networks were constructed on the basis of multilayer perceptrons [9,11,18,25]. Increased number of layers has enhanced ability to universal approximation properties. Compared to shallow neural networks deep ones can significantly improve the quality of information processing. Although their implementation is associated with some computational difficulties such as “vanishing” or “explosion” of gradients that paralyze learning in a specific way of sigmoid activation functions. A family of so-called linear rectifiers is used in deep networks as activation functions to overcome problems related to sigmoids [29]. Typical representatives: LReLU – leaky rectified linear unit; PReLU – parametric rectified linear unit; RReLU – randomized leaky rectified linear unit; NReLU – noisy rectified linear unit; ELU – exponential linear unit and the most popular ReLU – rectified linear unit [5,9,11–13,29]. Typically this piecewise linear functions that have fairly simple derivatives overcomes the problem of vanishing and explosion gradients. However to achieve the required approximation properties the number of layers in deep neural networks must be significantly increased [22]. It is possible to improve the process of learning deep networks by changing not only the synaptic weights of each neuron but also the activation function itself – its parameters and shape. In [1,10,23,27,28] was introduced a number of approaches and solutions to this problem with synaptic weights and function parameters being adjusted independently using completely different procedures. It is worth noting that from a computational point of view setting up activation functions is much more difficult than training synaptic weights. In this regard it is advisable to consider a tunable activation function which is specified together with the synaptic weights within a single learning algorithm. It is important that this function includes known activation functions such as a linear rectifier and compression as a partial case.
2
Related Works
In deep neural networks one of the most popular activation functions is: αzi , if zi > 0 ai = AdReLU (zi ) = , βzi , otherwise
(1)
where α and β are the hyperparameters. Usually, α is fixed with a value of 1 and β is set purely experimentally. It is clear that for β = 0 we obtain the rectified linear unit (ReLU) activation function [24] (Fig. 1). The authors of [8] demonstrated that a linear rectifier helps to propagate gradients due to the absence of an upper limit in the function. The linear derivative calculations are performed faster as well. Although the zero value of the function with negative values of the argument leads to the “death” of neurons. The probability of getting a dead fully connected network with ReLU activation function at accidentally initialized scales increases together with the number of hidden layers was proved by [21]. The research of more resistant to zero gradient functions is still ongoing due to the death of neurons. For example [14] demonstrated that
626
B. Bilonoh et al.
Fig. 1. AdReLU activation function with α = 1; β = 0 (ReLU) and α = 1.1; β = −0.1
stochastic regularization in the activation function helps to train the network more effectively. The parameter β in the [1] is of great importance in recent research of activation functions. For example by changing the β hyperparameter to a synaptic weight parameter (PReLU) the authors of [12] demonstrated the superiority of customizable activation functions due to their regularization properties at the precursor function level.
3
Architecture of Neuron with Tunable Activation Function
A traditional artificial neuron of any neural network (shallow and deep) scores a nonlinear mapping type: N wij xij ) = ψ(wi T xi ) = ψ(zi ), ai = ψ(
(2)
j=0
where ai is the input signal of ith neuron; xi = (1, xi1 , ..., xij (t), ..., xiN (t))T ∈ RN +1 is the input vector of ith neuron wi = (wi0 , ..., wij , ..., wiN ) ∈ RN +1 is the vector of synaptic weights that are adjusted during training process (wi0 ≡ bi is the bias signal); zi = wi T xi is the weighted sum of inputs, ψ(·) is the nonlinear activation function of the neuron that are chosen empirically usually. The choice of activation function for neurons in hidden layers is reduced to an empirical choice from a list of universal functions that have shown satisfactory
Tunable Activation Functions for Deep Neural Networks
627
results on most problems. For example one of the most popular functions in shallow neural networks which satisfies the conditions of Cybenko’s theorem: ai = ψ(zi ) = tanh(zi ) = and its derivative:
e2zi − 1 , e2zi + 1
ψ (zi ) = 1 − a2i .
(3)
(4)
a2i
It is clear that the value of increases as the number of hidden layers increases and the derivative tends to 0. This leads to the stop of learning process and is called the vanishing gradient problem. 3.1
Tunable Polynomial Activation Function
Traditional polynomial function has the following form: f (x) = an xn + an−1 xn−1 + . . . + a2 x2 + a1 x + a0 ,
(5)
where a0 , . . . , an are constants and x is an input signal of function f . Polynomial function plays important role because it can approximate any continuous function on a closed interval of the real line. Based on approximation properties of polynomial functions we propose activation function based on [5] with replaced constants as adjusted parameters that tuned during learning procedure: αi1 zi + αi2 zi2 + αi3 zi3 , if zi ≥ 0 , (6) ai = βi1 zi + βi2 zi2 + βi3 zi3 , otherwise where αij and βij are the parameters that are adjusted with synaptic weights. n = 3 is enough for current problem because of split form. Each negative and positive part of function domain have different polynomial mapper which in turn allows function to take different forms. It is easy to see that when βij = 0 the function coincides with ReLU or βi1 = βi2 = βi3 – PReLU or αi1 = −βi1 = 1; αi2 = βi2 = 0; αi3 = −βi3 = −0.2; −1 < zi < 1 is nothing but a type of squashing function (3) (Fig. 2). Figure 3 shows the architecture of a neuron with a tunable polynomial activation function whose parameters αi1 , αi2 , αi3 , βi1 , βi2 , βi3 are updated in the learning process. Also yj is external reference signal and ej = yj − yˆj = yj −ψ(aj ) is learning error. 3.2
Optimization of the Tunable Polynomial Activation Function
Each activation function of neural network has to be continuously differentiable deu to backpropagation learning algorithm. The tunable polynomial activation function can be trained using the backpropagation algorithm together with other
628
B. Bilonoh et al.
Fig. 2. tanh activation function and polynomial with αi1 = −βi1 = 1; αi2 = βi2 = 0; αi3 = −βi3 = −0.2.
layers. Updating the parameters αij and βij can be done using the rule of differentiation of a complex function. This requires finding partial derivatives of ∂zi F ; ∂αij F ; ∂βij F . αi1 + 2αi2 zi + 3αi3 zi2 , if zi ≥ 0 ∂F = , (7) ∂zi βi1 + 2βi2 zi + 3βi3 zi2 , otherwise ∂F ∂F = = zij . ∂αij ∂βij
(8)
As we can see partial derivatives of αij and βij contain power which indicates that propagation of error signal backward will be with higher magnitude. This property allows using lower learning rate for learning procedure. It should be noted that if the classical artificial neuron contains n + 1 adjustable synaptic weights then the proposed one has n + 7 parameters.
4
Experiments
We first researched the capabilities of a neuron with a tunable polynomial activation function in terms of initial initialization. Then we performed an evaluation comparison with three standard classification standards (MNIST [20] and CIFAR10 [17]). The comparison with the polynomial activation function was performed with the most frequently used activation functions, both parametric and static.
Tunable Activation Functions for Deep Neural Networks
629
Fig. 3. Neuron with tunable polynomial activation function.
4.1
Initial Initialization of Tunable Polynomial Parameters
The initial initialization experiment was performed by LeNet-5 convolutional neural network architecture [19] over the MNIST dataset. The tanh activation functions have been changed to tunable polynomials. In Fig. 4 demonstrates several different initialization initializers of the activation tunable function. It is easy to notice that the results are almost the same for all types of initialization which shows the insensitivity of this activation function to the initial parameters. The initialization of the coefficients αi1 = 1; βi1 = 0.3; αi2 = αi3 = βi2 = βi3 = 0 or αi1 = 1; αi2 = αi3 = βi1 = βi2 = βi3 = 0 can accelerate convergence in general. 4.2
Performance on MNIST
We evaluated the quality of the LeNet-5 architecture with a variety of activation functions and the two most popular optimizers Adam [16] and SGD [3]. A variable learning rate (1e–2, 1e–3, 1e–4, 1e–5) was also used. The optimal number of learning steps is one of the key ideas for tunable activation functions. Table 1 shows the average accuracy of the test data after one learning period for different functions, optimizers and learning rate values. The average is calculated for ten training cycles of one model. Table 1 shows that the tunable polynomial activation function is better in terms of learning efficiency and accuracy of classification on MNIST data. Interestingly the model with the researched function learns better when the value of the learning rate is smaller. In the general case a smaller value of the learning rate leads to a worse result due to the problem of vanishing of the gradient. Tunable function have the ability to prevent this problem.
630
B. Bilonoh et al.
Fig. 4. Plot of the epoch accuracy function for a neural network that was trained with various initializers for the parameters of the tunable polynomial function. Table 1. The average accuracy on the test sample from the MNIST dataset for ten LeNet-5 training cycles after the first epochs. Learning rate 1e–2
4.3
1e–3
1e–4
1e–5
0.850
0.301
0.954
0.840
0.401
0.946
0.861
0.391
ReLU
0.982 0.942
GELU
0.981
PReLU
0.979
SLP
0.978
0.971 0.902 0.445
Performance on CIFAR10
We decided to test our tunable activation function on a more complex task and a deeper model based on the results obtained on the MNIST dataset. CIFAR10 is a well-known experimental dataset that contains color images divided into 10 categories. The VGG [26] architecture was chosen as the model consisting in the middle of a set of convolutional and subdiscrete layers and fully connected at the end (Fig. 5). Network training was performed using the Adam optimization algorithm with a learning rate of 1e−3 and batch size 128. Initialization of synaptic weights was done as a method described in [7] and the parameters of the tunable activation function based on the results of Sect. 4.1. Table 2 shows the accuracy results of networks with different activation functions on the test data set after 1, 5 and 10 training epochs. As we can see the model with a tunable polynomial activation activation function had a higher
Tunable Activation Functions for Deep Neural Networks
631
accuracy after first epoch than other models. This trend remains throughout the training which indicates the productive learning of the neural network.
(a)
(b)
Fig. 5. VGG neural network architecture (a) and VGG-block structure (b).
5
Conclusion
We have introduced a new artificial neural network activation function which is a generalization of a wide range of activation functions used in deep learning. The use of tunable polynomial avoids the unwanted effects of leaching and explosion and vanishing of gradient that occur during learning process. Training of parameters of tunable function together with synaptic weights allows to use the proposed function in any architecture of a shallow or deep artificial neural network. Table 2. The average accuracy on the test sample from the CIFA10 dataset for ten VGG training cycles after 1, 5 and 10 epochs. Number of epoch 1
5
10
ReLU
0.347 0.684 0.791
GELU
0.451 0.727 0.805
PReLU
0.398 0.724 0.811
SLP
0.465 0.738 0.823
It has been experimentally demonstrated that a tunable polynomial activation function leads to faster model convergence and more accurate learning due
632
B. Bilonoh et al.
to the possibility of using a smaller learning rate. Based on this fact the transfer training can also be improved because the rate reduction allows the training process is stopped later than with the use of the classic activation functions. It has also been shown that different initialization of the proposed function parameters leads to almost identical results of model accuracy but parameters close to PReLU give a better start in some problems. A more detailed research of the forms of tunable activation functions and their influence on the form of the loss function can deepen the understanding of the behavior of artificial neural networks. More complex media data like videos can be challenging for a future research of tunable activation function and their influence on the convergence of learning procedure.
References 1. Agostinelli, F., Hoffman, M., Sadowski, P., Baldi, P.: Learning activation functions to improve deep neural networks. arXiv preprint arXiv:1412.6830 (2014) 2. Bodyanskiy, Y., Kulishova, N., Rudenko, O.: One model of formal neuron. Rep. Nat. Acad. Sci. Ukraine 4, 69–73 (2001) 3. Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Advances in Neural Information Processing Systems, pp. 161–168 (2008) 4. Cichocki, A., Unbehauen, R., Swiniarski, R.W.: Neural Networks for Optimization and Signal Processing, vol. 253. Wiley, New York (1993). https://doi.org/10.1016/ 0925-2312(94)90041-8 5. Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289 (2015) 6. Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Systems 2(4), 303–314 (1989). https://doi.org/10.1007/BF02134016 7. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010) 8. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011) 9. Heaton, Y., Jeff, I.G., Bengio, Y., Courville, A.: Deep learning. Genetic programming and evolvable machines. Nature 19(1–2), 305–307 (2017). https://doi.org/ 10.1007/s10710-017-9314-z 10. Goyal, M., Goyal, R., Lall, B.: Learning activation functions: a new paradigm of understanding neural networks. arXiv preprint arXiv:1906.09529 (2019) 11. Graupe, D.: Deep Learning Neural Networks: Design and Case Studies. World Scientific Publishing Company (2016) 12. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing humanlevel performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015). https://doi.org/10. 1109/iccv.2015.123 13. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/cvpr.2016.90 14. Hendrycks, D., Gimpel, K.: Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016)
Tunable Activation Functions for Deep Neural Networks
633
15. Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991). https://doi.org/10.1016/0893-6080(91)90009-t 16. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014) 17. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009) 18. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015). https://doi.org/10.1038/nature14539 19. LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989). https://doi.org/10.1162/neco.1989.1.4.541 20. LeCun, Y., Cortes, C., Burges, C.: MNIST handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist (2010) 21. Lu, L., Shin, Y., Su, Y., Karniadakis, G.E.: Dying relu and initialization: theory and numerical examples. arXiv preprint arXiv:1903.06733 (2019). https://doi.org/ 10.4208/cicp.OA-2020-0165 22. Lu, Z., Pu, H., Wang, F., Hu, Z., Wang, L.: The expressive power of neural networks: a view from the width. In: Advances in Neural Information Processing Systems, pp. 6231–6239 (2017) 23. Molina, A., Schramowski, P., Kersting, K.: Pade activation units: end-toend learning of flexible activation functions in deep networks. arXiv preprint arXiv:1907.06732 (2019) 24. Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML (2010) 25. Schmidhuber, J.: Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015). https://doi.org/10.1016/j.neunet.2014.09.003 26. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014) 27. Sonoda, S., Murata, N.: Neural network with unbounded activation functions is universal approximator. Appl. Comput. Harmonic Anal. 43(2), 233–268 (2017). https://doi.org/10.1016/j.acha.2015.12.005 28. Arai, K., Kapoor, S., Bhatia, R. (eds.): SAI 2020. AISC, vol. 1230. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52243-8 29. Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853 (2015)
Markov-Chain-Based Agents for k-Armed Bandit Problem Vladyslav Sarnatskyi(B)
and Igor Baklan
National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine [email protected], [email protected]
Abstract. In this paper, we present our findings on applying Markov chain generative model to model actions of an agent in Markov decision process framework. We outlined a problem of current solutions to reinforcement learning problems that utilize the agent-environment framework. This problem arises from the necessity of performing analysis of each environment state (for example for q-value estimation in q-learning and deep q-learning methods), which can be computationally heavy. We propose a simple method of ‘skipping’ intermediate state analysis for which optimal actions are determined from analysis of some previous state and modeled by a Markov chain. We observed a problem of this approach that limits agent’s exploratory behavior by setting Markov chain’s probabilities close to either 0 or 1. It was shown that the proposed solution by L1 -normalization of transition probabilities can successfully handle this problem. We tested our approach on a simple environment of k-armed bandit problem and showed that it outperforms commonly used gradient bandit algorithm.
Keywords: Reinforcement learning chain
1
· Artificial intelligence · Markov
Introduction
One of the most common ways of solving reinforcement learning problems is to use agent-environment framework. According to it, such problems are formulated as interactions between agent and environment (there are also multiagent tasks, but we don’t focus on them). More precisely, environment at given time step t is described by state St ∈ S. Agent can observe this state fully or partially and . produce its action At ∈ A with probability P r{At = a} = πt (a). In response, environment rewards agent with reward Rt ∈ R ⊂ R. The goal of the agent is to T maximize its cumulative reward t=1 Rt . This framework is also called Markov decision process (MDP).
c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 634–644, 2022. https://doi.org/10.1007/978-3-030-82014-5_44
Markov-Chain-Based Agents for k-Armed Bandit Problem
635
Despite impressive performance, this approach has its drawbacks, the most significant of which is required computational power. For example, AlphaGo used 48 CPUs and 8 GPUs for it’s training, which lasted for 3 days. Considering the fact that almost every machine learning problem requires multiple iterations of hyperparameters tuning, makes replication of AlphaGo training process or applying this technique to a different domain suitable only for an expensive high-end computer clusters. This drawback rises from the core of MDP—agent has to evaluate each environment state in order to choose next action. This evaluation is often done using deep neural networks, which are known to be computationally heavy. So, in order to reduce the amount on required computations per training process, one can reduce number of computations per state evaluation (use shallower neural network, or different kind of regression model), or reduce number of state evaluations. First approach can significantly harm overall performance as it may increase bias and variance of q-value estimation. In contrast, the second one can be potentially suitable for a set of environments. Super Mario Bros.—a Nintendo Entertainment System video game [12] can be an example of such environment. In this game, agent controls main character of the game, and have 12 available actions [4]. The goal of the game is to move as far to the right side of each level as possible. In Fig. 1 we can see states S1 , S2 , . . . , S5 , when Ai = right, i ∈ {1..4}.
Fig. 1. States S1 , S2 , . . . , S5 of SuperMarioBros-v0 environment, when Ai = right, i ∈ {1..4}
When a long series of actions in needed in order to obtain optimal future reward and can be predicted beforehand, evaluations of intermediate states (S2 , S3 , S4 in this example) can be skipped. Thus, actions can be modeled using a sequence generating model and the simplest of them is Markov chain.
2
Related Works
MDP framework was successfully applied to a variety of tasks, from optimal control problem such as pole balancing [2], to playing Atari 2600 game ‘Breakout’ [5] and surpassing human-level performance in board games of chess [11] and Go [10]. Several approaches were utilized in order to reduce computational time of reinforcement learning algorithms. Thus, is was shown that efficient parallelization can make significant impact on computational efficiency of these
636
V. Sarnatskyi and I. Baklan
algorithms [1,3,6,13]. These approaches consider multi-CPU and/or multi-GPU frameworks for Q-value estimation and policy gradient computation. Another approach is to user previously trained agent as a teacher for current agent to give it some sort of “kick start” [9]. Similarly, a smaller and more efficient model can be trained to a comparable performance as a large model following “policy distillation” approach [8]. In [7] it was shown that usage of multiple expert teachers and pretraining on several environments can speedup training on a new environment. There are several works considering solving k-armed bandit problem by applying Markov chain models. Thus, authors of [17] showed an approach to solving k-armed bandit problem by modeling reward distribution by a continuous Markov chain. They also established an upper bound of proposed learning algorithm. In [16] it was shown that switching between two algorithms (PMC and SVE) modeled by a Markov chain can increase cumulative reward compared to using either of them.
3
Markov-Chain-Based Agent
As Markov chains generate infinite sequences, we extended them in a following way. Let A be a set of Markov chain’s states, Ai ∈ A—generated state at time step i, L—length of generated sequence, v ∈ R|A| —initial state distribution, w ∈ R|A| —final state distribution, M —transition matrix. Then, sequence generating process can be defined as: Pr(A1 = ai ) = vi , i ∈ {1 . . . |A|} Pr(Ak+1 = ai |Ak = aj ) = Mj,i Pr(L = k|Ak = ai ) = wi , i ∈ {1 . . . |A|}
(1)
The key difference of this model compared to the classical Markov chain is final state distribution, which is probability of ending sequence generation at current state. By further modifications this model can be applied to Markov decision process: Ψ (St , ...S1 ) = v (t) , w(t) , M (t) (t) Pr(At+1 = ai ) = vi , i ∈ {1 . . . |A|} , (2) (t) Pr(At+k+1 = ai |At+k = aj ) = Mj,i (t) Pr(L = k|At+k = ai ) = wi , i ∈ {1 . . . |A|} where Si is the observed environment state at time step i; v (t) , w(t) , M (t) are the distribution of initial state, distribution of final state and transition matrix at time step t respectively, Ψ is the function, that estimates extended Markov chain parameters from previous observed states history. Notice: some previous states can be ignored in the actual computation of Ψ , despite their presence in it’s formulation. Given this definition, agent’s action decision process can be described by following algorithm: Thus, in order to maximize cumulative reward, search should be done for the function Ψ .
Markov-Chain-Based Agents for k-Armed Bandit Problem
637
Algorithm 1: Agent’s action decision process t ← 1; while current state isn’t terminal do Obtain v (t) , w(t) , M (t) from Ψ (St . . . S1 ); ˆ Generate action sequence A; ˆ for a ∈ A do t ← t + 1; Take action a; Observe state St ; end end
4 4.1
Markov Chain Gradient Bandit Algorithm K-Armed Bandit
Let’s consider an easiest example of reinforcement learning problem called karmed bandit problem [14]. If we formulate it from the perspective of agentenvironment interaction, state Rt is presented by k sets of parameters of normal distribution: μ1,t , μ2,t , . . . μk,t , σ1,t , σ2,t , . . . σk,t ; action set has k members: S = {s1 , s2 , . . . , sk }; reward function can be described as Rt (s, a) = r, r ∼ N (μa,t , σa,t ). Obviously, state is not visible to the agent as it would be trivial to maximize cumulative reward: agent could at each time step choose action At = arg max μa,t . As one may notice, environment state’s parameters are a
indexed by time step t, which means that they can be different at different time steps. If it is allowed for them to be different, such problem is called nonstationary k-armed bandit, and stationary—otherwise. Non-stationary gradient bandit is often modeled as: ˆ, μ ˆ ∼ N (0, σ), i ∈ 1..k μi,t+1 = μi,t + μ
(3)
where σ controls the degree of non-stationarity: large values of σ correspond to high magnitudes of changes to reward for given action across time. If σ = 0 model is equivalent to stationary gradient bandit. 4.2
Gradient Bandit Algorithm
As described in [15], on of the possible ways to solve k-armed bandit problem is gradient bandit algorithm. According to it, agent has it’s ‘preference’ of action a at time step t: Ht (a) ∈ R. Based on these values, probability of action a at time step t is defined as softmax-weighted ‘preference’ of it: . eHt (a) πt (a) = H (b) e t b∈A
(4)
638
V. Sarnatskyi and I. Baklan
In order to find Ht (a), agent keeps track of averaged reward Rt : Rt = Rt−1 + α(Rt − Rt−1 )
(5)
(average reward can be formulated as arithmetic mean, but it has significant drawbacks when dealing with non-stationary problem). Using averaged reward, Ht (a) can be found using gradient ascend with following update rule: ∂E[Rt ] . (6) Ht (a) = Ht−1 (a) + λ ∂Ht−1 (a) or: . Ht (a) = Ht−1 (a) + λ Rt − Rt πt (a), a = At (7) . Ht (At ) = Ht−1 (At ) + λ Rt − Rt (1 − πt (At )), a = At 4.3
Markov Chain Gradient Bandit Algorithm
In order to apply the same idea to Markov chain decision process we define probability of action a at time step t (according to Algorithm 1) as: . (1) t=1 πt (ai ) = vi (8) . |A| (t) πt (ai ) = j=1 πt−1 (aj )Mj,i t > 1 ˆ (t) corresponding to v (t) , w(t) , M (t) , Agent has probability logits vˆ(t) , w ˆ (t) , M where: (t)
vi
(t)
=
evˆi |A| vˆ(t) e j j
(t)
ˆ i ew
(t)
wi =
|A| k=1
e
(t) ˆ (t) M ˆ i,k +ew i
(t)
Mi,j =
|A| k=1
(9)
ˆ (t) M e i,j
e
(t) ˆ (t) M ˆ i,k +ew i
In order to find these probabilities logits, as in gradient bandit algorithm, agent keeps track of averaged reward as shown in (5). ˆ (t) can be solved by ˆ (t) , M Given these definitions, problem of finding vˆ(t) , w gradient ascend: (t) (t) L = Rt − Rt vAt−L+1 wAt vˆ(t+1) = vˆ(t) + λ ∂∂L v ˆ(t) w ˆ (t+1) = w ˆ (t) + λ ∂ ∂L w ˆ (t) ˆ (t+1) = M ˆ (t) + λ ∂L M
t i=t−L+1
(t)
MAi ,Ai+1 (10)
ˆ (t) ∂M
As k-armed bandit problem has no observable state, Ψ has no use and probability logits can be optimized directly.
Markov-Chain-Based Agents for k-Armed Bandit Problem
639
As shown in Fig. 5 and Fig. 6, MCGB algorithm has a tendency to generate transition matrices with probabilities which are close to 0 or 1, thus limiting it’s exploratory behaviour. This may not be a problem for stationary environments, but for non-stationary environments agent with high exploitation behaviour will not be adaptable enough. To address this issue, we propose a method to regularize probability logits by adding weighted L1 norm of them as shown in (11). t (t) (t) (t) L = Rt − Rt vAt−L+1 wAt MAi ,Ai+1 +
i=t−L+1
ˆ (t)
ˆ (t) 1 +η M
+ vˆ(t) 1 + w 1
vˆ(t+1) = vˆ(t) + λ ∂∂L v ˆ(t) w ˆ (t+1) = w ˆ (t) + λ ∂ ∂L w ˆ (t) ˆ (t+1) = M ˆ (t) + λ ∂L M
(11)
ˆ (t) ∂M
5
Results and Discussion
We performed a series of tests on both stationary and non-stationary k-armed bandit environments with k ∈ {5, 10, 25, 100}, σ ∈ {0, 0.01, 0.05}, η ∈ {0, 0.1, 1}. Each test consisted of 100 episodes 104 time steps each with different reward probabilities across runs. Results can be seen in Fig. 2, 3. For the case when k ∈ {5, 10}, σ = 0.01 we performed additional longer experiments with 105 time steps. Results of these experiments are shown in Fig. 4. As we can see, MCGB algorithm outperforms GB algorithm for highly nonstationary k-armed bandit, for which setting η > 0 is crucial. If η is set correctly, MCGB achieves both higher average reward and probability of taking optimal action than GB. However, initial convergence rate of MCGB is lower (can easily be seen in Fig. 2 for k = 100). The actual Markov chains found during testing are of special interest. For example, if distinct optimal action is present, found Markov chain generates long sequence of optimal actions (notice high probability of transitions “Start” → 7 and 7 → 7 in upper-right part of Fig. 5). When there are multiple sub-optimal solutions with non-significant difference in reward for taking them—found Markov chain can generate sequence, containing subset of these actions, which is a problem for a future research. An example of such result is in Fig. 6. In order to demonstrate the effect of regularization, we trained an agent on the same environment, but with η = 0.1 and η = 1. The results are shown in the lower half of Fig. 5. As one can see, with the increase of η the probability of transition from “Start” to state corresponding to the optimal action (7 in this case) decreases. It means, that with higher probability agent will take different action other than optimal which will lead to higher exploration.
640
V. Sarnatskyi and I. Baklan
Fig. 2. Change of average reward with time (higher is better). Horizontal axis—number of iterations.
6
Conclusions
The conducted research concerning usage of Markov-chain-based intellectual agents as solution to the problem of high computational cost of common approaches to reinforcement learning problems has confirmed the assumption about it’s applicability to k-armed bandit problem.
Markov-Chain-Based Agents for k-Armed Bandit Problem
641
Fig. 3. Change of average probability of taking optimal action (higher is better). “Staircase” effect is present as the result of applied median filtering. Horizontal axis—number of iterations.
We formulated the concept of Markov-chain-based agent as part of Markov decision process framework and adapted it to k-armed bandit problem. Resulting Markov chain gradient bandit algorithm (MCGB) was compared to commonly used gradient bandit algorithm (GB), which showed that MCGB achieves comparable average reward as GB. MCGB outperforms GB on non-stationary k-
642
V. Sarnatskyi and I. Baklan
Fig. 4. Change of average reward with time (higher is better). Horizontal axis—number of iterations.
Fig. 5. Action reward distribution (upper-left) and found solution: η = 0—upper-right, η = 0.1—lower-left, η = 1—lower-right. Transitions from state ‘start’ correspond to v, to state ‘end’—to w.
Markov-Chain-Based Agents for k-Armed Bandit Problem
643
Fig. 6. Action reward distribution (left) and found solution (right). Transitions from state ‘start’ correspond to v, to state ‘end’—to w. Notice a cycle between actions 7 and 22.
armed bandit in the long run when regularized correctly by using proper value of η regardless of number of available actions (tested values {5, 10, 25, 100}), but converges slowly at the start of optimization. We inspected the actual solutions found in the process of training and found evidence that proposed regularization method successfully controls the trade-off between exploration and exploitation. Precise control of this trade-off is important for adaptation to changes in environment and avoiding local minima during Markov chain transition matrix optimization. To summarize, found solution to reinforcement learning tasks shows great results when applied to k-armed bandit problem and it’s adaptation to more general set of environments with visible state looks promising.
References 1. Babaeizadeh, M., Frosio, I., Tyree, S., Clemons, J., Kautz, J.: Reinforcement learning through asynchronous advantage actor-critic on a GPU (2017) 2. Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. 5, 834–846 (1983). https://doi.org/10.1109/tsmc.1983.6313077 3. Clemente, A.V., Castej´ on, H.N., Chandra, A.: Efficient parallel methods for deep reinforcement learning (2017) 4. Kauten, C.: Super Mario Bros for OpenAI Gym. GitHub (2018). https://github. com/Kautenja/gym-super-mario-bros
644
V. Sarnatskyi and I. Baklan
5. Mnih, V., et al.: Playing Atari with deep reinforcement learning (2013) 6. Nair, A., et al.: Massively parallel methods for deep reinforcement learning (2015) 7. Parisotto, E., Ba, J.L., Salakhutdinov, R.: Actor-mimic: Deep multitask and transfer reinforcement learning (2016) 8. Rusu, A.A., et al.: Policy distillation (2016) 9. Schmitt, S., et al.: Kickstarting deep reinforcement learning (2018) 10. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016). https://doi.org/10.1038/nature16961 11. Silver, D., et al.: Mastering chess and shogi by self-play with a general reinforcement learning algorithm (2017) 12. Staff, N.: NES games (2010). https://web.archive.org/web/20101221005931/. http://www.nintendo.com/consumer/gameslist/manuals/nes games.pdf 13. Stooke, A., Abbeel, P.: Accelerated methods for deep reinforcement learning (2019) 14. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, pp. 25–27. MIT Press, Cambridge (2018) 15. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, pp. 37–40. MIT Press, Cambridge (2018) 16. Wang, G.: A multi-armed bandit MCMC, with applications in sampling from doubly intractable posterior (2019) 17. Zhou, X., Xiong, Y., Chen, N., Gao, X.: Regime switching bandits (2021)
Predicting Customer Churn Using Machine Learning in IT Startups Viktor Morozov(B) , Olga Mezentseva , Anna Kolomiiets , and Maksym Proskurin Taras Shevchenko National University of Kyiv, Bogdan Gavrilishin Street, 24, Kyiv 04116, Ukraine [email protected] Abstract. This work is devoted to the consideration of the issues of increasing the development of start-up projects with using modern methods of artificial intelligence. As a rule, such projects are based on innovations and their implementation requires registration as independent enterprises. Operating in market conditions, most IT companies are forced to develop new innovative ideas and present them in the form of startups At the same time, small and medium-sized businesses interact with many external independent potential customers. Such users of information systems may subsequently become clients of such enterprises. The growth and development of these SaaS enterprises relies heavily on the average customer, which is the topic of this article, with the goal of reducing customer churn. The direction of such research is addressed in many of the rules of thumb for SaaS metrics. In these conditions, it is important to take into account the completeness of the functional interaction of SaaS customers, the authors proposed a hypothesis on the possibility of using intelligent methods for predicting customer churn using deep learning neural networks. At the same time, the needs of the stakeholders of such projects should be taken into account, the satisfaction of which occurs when interacting with an innovative IT product. To describe the interactions, the authors consider mathematical models, and also propose modeling methods. To conduct chain training, Python functionality is used, with the processing of user activity datasets. In this paper, a section of the conclusion is proposed in which the results obtained are discussed and evaluated. Keywords: Startup · Projects · Innovation · IT product · Deep neural network · Modeling · Decision making · Interaction
1
Introduction
Small businesses, such as “startups”, create favorable conditions for the development of innovation. A startup (from the English “start-up” – to launch) is an innovative project, which uses a scalable business model, taking into account the level of uncertainty when creating new services or a new IT product [17]. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 645–664, 2022. https://doi.org/10.1007/978-3-030-82014-5_45
646
V. Morozov et al.
Countries that have managed to establish a constant process of generating new knowledge and innovative ideas and turning them into innovative products are the most effective today, and they play a leading role in the global economy. The experience of the United States, which brings 85% of innovative products to the market, Japan – 75%, Germany – 55%, Israel – more than 50%, etc. is indicative. The share of innovations in the total volume of products produced in Ukraine, unfortunately, does not exceed 2%. In the 2017 ranking – Global Innovation Index of countries with the most innovative economies, Ukraine ranked 42nd, dropping one position compared to the previous year [11]. Based on an analysis of the number of startups in 137 countries around the world, the Startup Ranking service has developed a rating in which Ukraine ranked 42nd (215 startups) in 2018. The first place in the ranking is occupied by the United States with 45,004 startups, the second place is occupied by India with 5,203 startups, and the third place is occupied by the United Kingdom with 4,702 startups [13]. There are no official statistics on startup development in Ukraine. Analyzing the Ukrainian start-up market [6], we can conclude that about 400 new such projects appear annually. However, the total number of such projects that have not yet gone bankrupt and are on the market is equal to 900. At the same time, about 150 startups are large projects and only about 30 of them continue to develop and thus survive. In general, startup projects are very risky, and their ability to survive in the market is very low. Recent research from Small Biz Trends shows that about 90% of new startups fail a year [12], and CB Insights specialists developed a chart of twenty reasons that lead to the termination of startups [14]. At the same time, SaaS (Software as a service) - software as a servant-is becoming very popular among IT startups, because this software distribution model is more accessible, both for creation and for use by the end user. It is known from open sources that in 2018 the total volume of global investments in innovative SaaS projects amounted to about 1,134.44 billion US dollars and the forecast is that it will grow to 2 220.21 billion with an annual growth rate of 13.1% until 2022 [5]. Considering B2B startups [4], it can be noted that they have long sales life cycles. This is due to the fact that corporate solutions are used for clients and they, in turn, depend on many external and internal factors with numerous interactions with a significant circle of persons. To do this more efficiently, you need to analyze large amounts of data to optimize interaction with potential customers and product strategy, but Analytics is one of the weakest points of startups [18]. Most start-up business owners put a lot of effort into attracting new customers and increasing their sales. But they tend to forget one of the most important parameters of MRR (recurring monthly revenue) and Retention (retention), which leads to the loss of customers over time, which is also called Churn (outflow) of users.
Predicting Customer Churn Using Machine Learning in IT Startups
647
At first, most startups experience difficulties with outflows. Its indicator can be either 15% or more, although the “churn rate” indicator is considered good for SaaS services to be less than 3%. It is extremely important to analyze the outflow of customers in a startup, because this will not only allow you to retain existing customers (because attracting a new client costs five times more than to defeat an existing one [3]), but also better adapt the product to the market and audience requirements, that is, create a better product strategy by the project team. If everything is done correctly, then the outflow rate should decrease and stabilize. Moving on to the consideration of the customer churn flows themselves and their forecasting, it should be noted that is one of the classic problems in deep data analysis (data mining). Telecommunications service providers have long analyzed patterns of using clients to predict customer churn. In many areas of production (for example, in the TOYOTA company), especially in the banking sector, the management of the companies conducts regular analytical studies that are aimed at predicting the level of customer satisfaction, which in turn determines the product update programs. The software-as-a-service (SaaS) model allows software vendors to collect data about customer usage that is not available to traditional software vendors. Although the SaaS and cloud computing market in general is growing rapidly, however, as far as we know, from a scientific and practical point of view, the problems in the B2B business model in the development of SaaS startups are not fully understood, and the use of well-known methods for analyzing customer outflows in the B2B SaaS industry has not been done much. Startups in Ukraine have their own special history of formation and development, despite the significant world experience in this area. This is primarily due to the processes of globalization, the rapid development of IT technologies, as well as to the specifics of the economy - on the other. The low speed of testing IT products, insufficient development of the technological base, and a high level of competition in the industry cause the main problems of domestic IT startups, namely, the slow pace of their creation and entry into the market with ready-made products.
2
Analysis of Methods for Predicting Interactions in Startups
Modern practice of IT business development has long identified development priorities based on a project approach. This is especially true of innovative projects based on startups. The issue of effective use of project management methods and tools in the implementation of innovations using IT was presented in [22,27]. For example, in [22], a description of the flexible methodology for managing innovative projects is given. At the same time, the issues of using lean manufacturing methods when creating IT products are touched upon. It also offers methods for assessing the effectiveness of project activities, methods for promoting IT products. However, the development methods of these products are not considered. Modern forecasting methods in project management are associated
648
V. Morozov et al.
with proactive management. Such models and methods are considered in works [19,21,25,26]. At the same time, technologies for tracking the results of customer interaction with the IT system are not considered. Also, the methods of formation of works on product development, based on the results of the analysis of such interactions, are not indicated. Methods for assessing customer loyalty are also not considered. Let’s move on to considering the problem of customer churn, which can be considered a classic one from a data mining point of view. While companies working directly with customers (BC), such as in the telecommunications or banking sector, have developed many approaches and have been using customer outflow forecasting for many years, business for business (B2B) has received much less attention in the scientific literature. The definition of outflow may be different for each organization, taking into account the three factors that separate the client from the company. In his work [20], Euler proposed to use the decision tree to determine the types of clients in telecommunication projects. Euler used the capabilities of the KDD MiningMart data preprocessing system to obtain predictable characteristics that were not present in the output data. Kusment and Van den-Paul used vector support machines to improve the productivity of predicting outflows for newspaper subscriptions [28]. The results of this work show that the interaction between customers and the supplier is important for outflow analysis. In their work [33] Kusment and Van-den-Paul conducted research and study of interactions between customers and providers. In doing so, they added emotion from email to their model. In work [6], the forecast parameters of customer churn were considered. At the same time, it was found that decision trees are superior in accuracy to regression methods and neural networks. To expand the analysis, numerous studies have explored various machine learning algo rhythms and their potential for outflow modeling. Since predicting whether a client will be lost is not a binary classification problem, several models have been tested, such as logistic regression [23,32], decision trees [29,31], random forest, supporting vector machines, and neural networks [23]. Undoubtedly, the use of the project approach requires a comprehensive consideration of the processes of developing information systems. At the same time, one should pay attention to the rate of development (growth) of IT companies building their business on the basis of SaaS models [29] and B2B models [4] and using start-up technologies. Of course, conducting this analysis, you can point to a fairly wide range of SaaS research. However, most of the work focuses on a few applications. For example, the previous work on the study of subscriber subscriptions is based on the data obtained in the field of mobile communications, the provision of the Internet and credit services [24,30]. The analysis of the reviewed sources [22,24–26] shows that for the management processes of the creation of modern information systems (IT products) based on commercial B2B models, there are currently no specific models that allow analyzing and predicting the results of customer interaction with the IT system. In addition, such methods should be able to react with the lowest cost
Predicting Customer Churn Using Machine Learning in IT Startups
649
to various deviations in such interactions, taking into account different groups of consumers. It is also necessary to determine reasonable directions for the development of such products with the greatest efficiency and maximum customer loyalty. The absence of such methods significantly reduces the level of efficiency and chewiness of start-up projects. In turn, the use of predictive models for accounting for customer churn will allow, when developing such complex products, to provide for certain actions to develop the system’s functionality based on predicate logic and machine learning. This approach will provide the finalization of the system with data on the dynamics of interaction, which in turn will ensure the introduction of changes to the system in the shortest possible time and with maximum benefit in the future.
3
Setting the Objective
This article aims to consider the results of the analysis of the use of models and methods for predicting customer churn using machine learning, which can be applied to B2B SaaS start-up companies. This will help startup founders and their teams better test their ideas, as well as more effectively create, bring to market and scale IT products that can not only increase the efficiency of the enterprise due to the competence of personnel, but also generally strengthen Ukraine’s position in the global innovation market. This article uses customer outflow analysis in telecommunications as a basis for studying the SaaS industry. Although there are differences between these two industries, they actually have a lot in common.
4 4.1
Materials and Methods Characteristics of the SaaS Program as a Service Model
Software as a service (SaaS) is a model for sending programs to consumers, in which the provider develops a web program, hosts it and manages it (independently or through third parties) in order to use it by customers via the internet. Customers pay not for ownership of pro-grams per se, but for their use (subscription) [15]. Simply put, SaaS is a deployment model in which customers subscribe to a service rather than purchase a license to own a software product. Subscriptions are sold for a period of time (for example, a monthly or annual subscription fee), similar to the mobile network and utilities. The SaaS model offers many benefits for customers (Fig. 1). The low cost of pre-payment makes high-end products affordable for companies that can’t afford to buy the product. System Administration is done by the seller, which eliminates the costs and number of employees on the part of the customer, as well as reduces the time and cost of training and support. A recurring revenue stream from subscription-based services is attractive to professionals, but the merchant’s reliance on subscription renewal makes the business much more sensitive to customer satisfaction.
650
V. Morozov et al.
Fig. 1. Comparison of SaaS with other distribution models
When a customer purchases a traditional software license, the product is installed on the client’s servers-the vendor does not see how the customer uses the product. SaaS vendors host their applications on servers managed by the vendor (or through third-party companies such as Amazon Web Services). This allows vendors to collect valuable data about how customers use their product, including information about who, when, how long, and how often uses the product. Many industries, such as telecommunications and banking, already use various technologies to predict customer churn and satisfaction. SaaS vendors have the ability to anticipate similar characteristics of their customers. SaaS differs from existing software delivery models in the following five parameters: 1) by access method. Access to software or services that are included in a SaaS product, Network Access also requires internet access; 2) for data storage. The client interacts directly with data stored exclusively on the service servers; 3) for storing the code. The code that determines the operation of the software and its results must be executed on the server side; 4) by system compatibility. From the point of view of the hardware architecture and the functioning of operating systems, in general, SaaS services are unpretentious; 5) by hardware architecture. Servers designed to store client data and provide SaaS services only work in a cloud computing environment. Obviously, client computers only need an Internet browser to use a SaaS application.
Predicting Customer Churn Using Machine Learning in IT Startups
651
SaaS has proven to be one of the most effective things that computer technology has experienced in its history, and many companies are trying to use such systems. As a result, the popular SaaS market continues its upward trend. The sector is expected to reach 6 623 billion by 2023 with an annual growth rate of 18% [16]. SaaS is now the largest segment of the global cloud services market. Now, if you understand the term “SaaS” product, you can correctly understand the value of B2B SaaS (B2B SaaS - 79.2% of the market, B2C - 20.8% [8]). B2B is provided for companies that sell goods/services to other companies. So, it is clear that B2B SaaS refers to companies that provide software (web application, extension) to other enterprises as a service. Their products function to help businesses work more with highly automated technologies. Its main goal is to reduce personnel costs. Because of this value, a large number of companies use this SaaS software to optimize their sales, marketing, and customer service to increase store productivity and generate more revenue. As a separate type of business technology, the use of SaaS is growing in the B2B sector. Here it is improved by using technologies such as machine learning and artificial intelligence. These technologies are used to further improve automation. In addition, there are special SaaS tools for small businesses that are aimed at providing such businesses with competitive advantages with larger companies: • 80% of businesses already use at least one SaaS application [16]; • 73% of enterprise IT companies plan to migrate all their SaaS systems by 2020 [7]; • Based on Blissfully forecasts, SaaS spending for companies of all sizes will double by 2020 [9]; • There are almost 7,000 SaaS companies in marketing alone. Statistics from the Synergy Research Group [10] show that in terms of capacity, the enterprise SaaS services market already shows revenue to software vendors of more than 23 billion yen in the first quarter and is projected to reach 100 billion yen per year in the current quarter. The global SaaS services market continues to grow by almost 30% per year. Among competitors, Microsoft leads the SaaS market, and currently has 17% of the total market, showing annual growth of 34% (Fig. 2). Today, Salesforce is a prime example of a highly developed SaaS-based business with a market share of 12% and annual growth of 21%. Salesforce has been based on the SaaS model since its first day. According to the Synergy Research Group, Salesforce has a majority market share when it comes to CRM. In 2005, Salesforce generated approximately 1 172 million in revenue dollars of income [12]. Salesforce’s SaaS business has grown rapidly over the past seven years, with Salesforce generating approximately.2.27 billion in revenue in 2012 [9]. All Salesforce services are based on a subscription with regular payments. 4.2
Advantages and Disadvantages of Using the SaaS Approach
Compared to traditional software installation options for business solutions, SaaS technology offers a whole range of possible advantages, including:
652
V. Morozov et al.
Fig. 2. SaaS market [10]
• Low initial cost. Unlike other types of software, SaaS services are usually paid for by regular subscription payments and do not require license fees. The consequence of this is a reduction in initial costs. The IT infrastructure on which the software runs is managed by the SaaS provider. This in turn reduces technical and software costs. • Quick setup and instant deployment. Using the service involves an already installed and configured SaaS application in the cloud. This minimizes overall time delays compared to the often extremely lengthy deployment of traditional software. • Easy to update. Hardware and software updates for SaaS services are handled centrally by vendors. This is done by deploying updates to applications already hosted in the cloud, which reduces the workload and responsibility in general and completely removes them from the consumer. • Availability. An Internet connection is the only requirement for accessing SaaS services. The access range is very wide - you can get the service from anywhere in the world. This gives SaaS a huge advantage over traditional business software. • Scalability. SaaS services are very multi-variable in terms of price and terms of service, and there is a lot of flexibility to change subscriptions if necessary. This is very important when growing a business, when you need a sharp increase in user accounts. • If the IT budget is limited, then SaaS and cloud computing help you use it most efficiently, including the use of the latest technologies and professional support. However, in addition to the above advantages, there are also possible disadvantages that should be considered before choosing this business. The SaaS model sometimes reflects some shortcomings, including the following:
Predicting Customer Churn Using Machine Learning in IT Startups
653
• Reduced control function. Proprietary software gives participants and their computers a higher degree of control than host solutions, in which control is carried out by a third party. Updating or changing software features cannot be delayed by any participant in the process. • Data security issues. As with all cloud and hosted servers, access control and the confidentiality of stored information remain a problem for SaaS services. • Limited range of applications. The SaaS hosting model is still not typical for many applications, although it is becoming more and more popular. • Offline problems. Without the Internet, the SaaS model doesn’t work, and customers lose access to their software or data. • Performance. The SaaS model has a real prospect of running at a lower speed than local or server applications due to the fault of a third party. 4.3
Key SaaS Metrics. Churn Rate “Customer Outflow Rate”
The change in the model for generating revenue from one-time license fees to regular payments from services led to the creation of a new business description for modeling estimated revenue and the life cycle of customers as a whole. Such a defined and sufficiently rigorous forecasting model is crucial for the future of the SaaS business: 1) predict customer base growth; 2) forecast revenue; 3) project future cash flows. For a SaaS business, there are a certain number of important metrics or variables that describe the main revenue forecasting mechanisms. The SaaS subscription model uses these important variables. There are other metrics to measure aspects of a SaaS business, but these variables shown in Table 1 are the basis needed to Generate Project Revenue. In this essay, we will focus more on the customer feedback rate (from the English Churn Rate). By itself, customer churn is a very simple indicator for a certain period of time, divided by the total number of remaining customers. Depending on your field, this may mean that customers have deleted their Akau-NT, canceled their subscription, did not renew their contract, did not buy again, or simply decided to switch to your competitor. By itself, customer churn is a very simple indicator for a certain period of time, divided by the total number of remaining customers. If you had 100 customers at the beginning of the month and 5 of them canceled their subscription within a month, this means a 5% outflow of customers (5/100). These are simple calculations, and how we determine these two numbers greatly affects the result. There are also certain external factors that are dictated by the management decisions of the business, and can obscure the obvious dependence of the resulting indicators that should lead to the result. Here is a simple formula for customer outflow: ChurnRate =
#of customers lost in period #of total customers at beginning of period
(1)
654
V. Morozov et al. Table 1. Basic terms and variables Term
Variable
Churn rate
a
Subscription price (per month)
P
Monthly recurring revenue
MRR
Customer lifetime
L
Customer lifetime value
CLTV
AcquisitionRate (new subscribers per month) S
A more advanced formula: ChurnRate = a =
Ccancels Ct
(2)
where a - is the customer outflow coefficient, C - number of customers, t - term (in months), Ccancels - number of canceled subscriptions for the period. Churn in the initial stages is a direct reflection of the value of the product and the importance of the features offered to customers. If the SaaS business is in particular a startup, the product must be constantly optimized in order to reduce the speed of user churn. When the IT project is excellent and optimized, the cancellation rate should drop to zero every month. In addition, changing users directly affects your financial performance (MRR/LTV/CAC). Churn affects recurring income, cost of living, and cost of living. To see the relationship between these variables, we will use the approximate values that are common for a SaaS subscription.: 1) 2) 3) 4)
Subscription Price: 49.95. US per month; customer outflow rate: 0.175%; purchase rate: 1000 to 2000 new subscribers per month; the total subscriber base is 10,000.
Monthly recurring revenue (MRR): if there is a situation where customers leave, then the revenue falls accordingly. For a SaaS business, monthly recurring revenue (MRR) is not only a function of the company’s life potential, but also an indicator that reflects the trend of long-term viability. Let’s say the subscription price for the SaaS (P) model is (49.95, the MRR for a single subscriber is 49.95. Then we will get the MRR for the company as a whole. We assume that the total subscriber base for 10,000 subscribers at the above subscription price for monthly subscriptions is calculated as: M RR = T otalSubscriptions · P
(3)
Using our sample values: M RR = 1000 · $49, 95
(4)
Predicting Customer Churn Using Machine Learning in IT Startups
655
If the money is spent on acquiring customers, and yet they leave before these expenses for them have time to pay off, then there is a serious lack of finance. Churn increases your average CAC. If you find a way to reduce the outflow at every opportunity, you will be able to quickly keep your CAC. P a
(5)
$49, 95 = $285, 43 0, 175
(6)
CLT V = Using our sample values CLT V =
One of the intermediate results is achieved by simply calculating the average subscription duration in months and multiplying it by the monthly subscription cost. CLT V = P · L, L =
1 1 = = 5, 71months, CLT V = $49, 95 · 5, 71 = $285, 43 a 0, 175
(7) Customer acquisition costs (CAC): if you spend money on acquiring customers, and they retreat before you return these costs, then you have a severe shortage. If money is spent on acquiring customers and they retreat before you pay back those costs, then you have a serious shortage. Churn increases your average CAC. If you reduce outflows at every opportunity, then you can quickly protect your CAC from your customers. One of the most effective ways to grow SaaS is through “net negative MRR churn”. Negative churn provides additional revenue that the business receives from existing customers period after period, and as a result exceeds the revenue that disappeared as a result of the cancellation and downgrade. The lower the user flow, the easier it is to achieve a net negative MRR outflow (Fig. 3). Attention was drawn to this striking difference in income. Businesses with negative outflows are almost 3 times larger than those with a standard outflow of 2.5% (note that this indicator is considered very good). And a business that users do not leave at all has only 60% more revenue than the same one with an outflow of 2.5%. Obviously, negative outflow is the most powerful growth accelerator. When calculating the churn rate, an analysis is performed in order to achieve a deeper understanding of customer behavior and why they leave the product subscription. If you change the situation, you can get improved customer retention rates to strengthen the business in the long run. Customer Retention is a company’s ability to maintain a long-term relationship with a customer. A high rate means that our customers are happy to return for a new purchase and recommend you to others. Customer Retention rate formula (CRr): CRr =
Customers at end of period − Customers acquired during period · 100 Customers at the start of the period (8)
656
V. Morozov et al.
Fig. 3. Negative outflow ratio of recurring monthly income (Negative MRR Churn)
Retention Curve - the retention curve (Fig. 4.). If the curve becomes parallel to The X-axis, then you have a product-market fit (the product meets the needs of a specific market and has value for the consumer or market segment). This criterion can provide more high-quality information about the product, which is critical for choosing the movement vector in the early stages of development. • The curve shows the percentage of customers active over time for each group (cohort) of customers who share a common characteristic, for example, the month of purchase; • If the curve becomes flat at a certain value, this is the percentage of customers held in this cohort; • The more suitable a product is for the market, the lower the initial outflow and the higher the percentage of customers who will try the product and continue to use it in the long run. As product improves, new groups will have a high cohort retention rate. Therefore, to track progress, it is important to have a “triangle” chart to save the cohort (Fig. 5). We plotted data points on a line graph, as in the example below. If productmarket fit improves, the cohort’s retention curves should sometimes become flatter (Fig. 6). The physical meaning of this criterion is very simple. If a product can attract some new users to a regular audience, it means that the time market finds the product useful and is ready to use it regularly to solve its problem.
Predicting Customer Churn Using Machine Learning in IT Startups
657
Fig. 4. Sample retention curve
Fig. 5. Sample triangle diagram for saving a cohort
A more pragmatic reason to aim for a plateau in retention is that it is a prerequisite for controlled and predictable growth. If the number of new users turns into a regular audience and stays with you “forever”, then this provides a foundation for growth. The active account, broken down by month of user arrival, will look like in the graph below. Each cohort adds a new layer to the future stable monthly audience. If new attracted users leave sooner or later (churn), then at some point you will reach the limit of growth. At an early stage, you can grow without a retention plateau, but at some point you will not be able to compensate for the departure of old users with new ones. Customer churn can be the result of low customer satisfaction, aggressive competitive strategies, rejection of new products, etc.
658
V. Morozov et al.
Fig. 6. Example of a multi-cohort retention curve diagram
Churn models are designed to detect early signals of customer churn and identify those who are highly likely not to use the services. For machine learning methods, it is necessary to collect training data, the main task of which is to describe the client in as much detail as possible. The following indicators can be used for this purpose: • Socio-demographic data. • Transactional data (the number and amount of transactions for the period, grouped by different criteria, etc.). • Product and segmentation data (changes in the number of contracts, belonging to the bank’s internal segmentation group, etc.). The target variable showing the probability of outflow obviously directly depends on changes in the client’s transaction activity, as well as on the client’s category and subscription type. After identifying a group of customers with an increased risk of outflow, methods of retaining customers are applied by providing profitable promotions, offers, and so on. From the analysis of machine learning methods, the optimal one for predicting customer outflows will be the one that most accurately corresponds to the relation-ships between the data in the sample that characterizes the type of business. The experiments were conducted using methods implemented in the Python programming language. The effectiveness of using methods is determined by the accuracy of classification, i.e. the proportion of objects that actually belong to the class relative to all objects that are assigned to the class based on the results.
Predicting Customer Churn Using Machine Learning in IT Startups
659
As a result (see Table 2), we found that linear regression, logistic regression, and Bayesian classifier have a low but stable evaluation quality. The decision tree and the reference vector method were forced to be retrained, having a 100% result in the training sample, but a very low result in the test sample. The optimal result was shown by using a random forest. This Model has a stable and high quality rating. But it is quite expensive to implement in the IT infrastructure. Table 2. Results of comparative analysis of machine learning methods for predicting customer churn Method
Training selection Test selection
Linear regression
0.76
0.75
Logistic regression
0.73
0.71
Neural networks
0.85
0.82
Decision tree
1.00
0.76
Random forest
0.88
0.87
Reference vector method
1.00
0.63
Naive Bayesian Classification 0.71
0.69
On the other hand, this analysis applies to large companies that operate with a heterogeneous package of services and large capital. From our point of view, among the methods considered, we should also analyze the approach to comparing machine learning methods used in similar businesses operating on the SaaS model, but in the field of B2B. The dataset includes information about: • Customers who disabled their subscription during the last quarter of Churn. • The type and number of services that each client subscribes to. • Information indicators about the client’s account: the time, how long he was a client, the type of contract with him, the payment method, the established monthly payments and total expenses. In the original dataset, we have 7,044 examples of B2B SaaS subscriptions and 21 variables (characteristics of companies that purchase services using the SaaS method). We have a very unbalanced data set. After the first visualization at-tempt (Fig. 7) the following trends are observed. About 20% of customers are businesses that have been around for less than 3 years, and they are much more likely to have an outflow of customers compared to a long-term business. Answering the question of how long customers stay in the company, at first glance it seems that the longer the company remains a customer, the more likely it is that it will remain loyal in the future. Obviously, buyers with low monthly payments ( 0 then 22: P := P (T2c ); 23: pr2 (T2 ) := {P }; 24: T.add(pr2 (T2 ));
The polymorphic function is_equivalent checks the equivalence of two crisp or fuzzy properties pi (T1 ), pj (T2 ) or methods fi (T1 ), fj (T2 ) and returns 1 if they are equivalent ones, or 0 in opposite case. It can be implemented using [31, Algorithm 1] for checking of the equivalence of fuzzy quantitative properties, [31, Algorithm 2] for checking of the equivalence of fuzzy qualitative properties, and [31, Algorithm 3] for checking of the equivalence of fuzzy methods. Algorithm 1 is most useful for the dynamic creation of new fuzzy inhomogeneous classes of objects when fuzzy homogeneous classes of objects T1 /M (T1 ) and T2 /M (T2 ) are similar but not equivalent ones, or when one of them is a subclass of another one. In such cases, it creates a new fuzzy single-core inhomogeneous class of objects T12 /M (T12 ), which defines the heterogeneous collection of fuzzy
Union of Fuzzy Homogeneous Classes of Objects
673
objects, i.e. such collection simultaneously contains objects of fuzzy types t1 and t2 which are defined by the classes T1 /M (T1 ) and T2 /M (T2 ) respectively. 25: for all fi /µ(fi ) ∈ F (T1 ) do 26: equivalent := false; 27: for all fj /µ(fj ) ∈ F (T2 ) do 28: if is equivalent(fi /µ(fi ), fj /µ(fj )) then 29: if Core(T1 , T2 ) ⊆ T then 30: F := {}; 31: Core(T1 , T2 ) := {F }; 32: T.add(Core(T1 , T2 )); 33: else 34: F := {}; 35: Core(T1 , T2 )(T ).add(F ); 36: F (Core(T1 , T2 )(T )).add(fi /µ(fi )); 37: F (T2c ).remove(fj /µ(fj )) 38: equivalent := true; 39: break; 40: if not equivalent then T then 41: if pr1 (T1 ) ⊆ 42: F := {}; 43: pr1 (T1 ) := {F }; 44: T.add(pr1 (T1 )); 45: else 46: F := {}; 47: pr1 (t1 )(T ).add(F ); 48: F (pr1 (T1 )(T )).add(fi /µ(fi )); 49: if length(F (T2c )) > 0 then 50: F := F (T2c ); 51: if pr2 (T2 ) ⊆ T then 52: pr2 (T2 ) := {F }; 53: T.add(pr2 (T2 )); 54: else 55: pr2 (T2 )(T ).add(F ); 56: return T ;
Let us estimate the time and space complexity of Algorithm 1. As we can see, the algorithm checks the equivalence of |P (T1 )| × |P (T2 )| = n × m properties and |F (T1 )| × |F (T2 )| = k × q methods, during the analysis of specifications and signatures of classes T1 /M (T1 ) and T2 /M (T2 ). It also performs copying of |P (T2 )| = m properties and |F (T2 )| = q methods, for the creation of a copy of the class T2 /M (T2 ), which is used for the creation of the projection pr2 (T2 ). After that, it performs copying and removing of |P (T1 )| = v1 properties and |F (T1 )| = v2 methods, for the creation of the core Core(T1 , T2 ), and copying of |P (T1 )| = w1 properties and |F (T1 )| = w2 , for the creation of the projection pr1 (T1 ), where v1 + v2 = n and w1 + w2 = k. Consequently, the time complexity
674
D. Terletskyi and S. Yershov
of Algorithm 1 is equal to O(n × m) + O(k × q) + O(m + q) + O(n + k) + O(v1 + v2 ) ≈ ≈ O(n2 + k 2 + c1 + c2 + r2 ), where n2 is a number of properties equivalence checks, k 2 is a number of methods equivalence checks, c1 and cs is a number of copying operations of properties and methods of class T1 /M (T1 ) and T2 /M (T2 ) respectively, and r2 is a number of removing operation of properties and methods of class T2 /M (T2 ). In addition, the algorithm uses tm1 + tm2 and tm2 memory for storing fuzzy types t1 , t2 , and copy of the class T1 /M (T1 ) respectively. Therefore, the space complexity of Algorithm 1 is equal to O(tm1 + 2tm2 ).
5
Application Example
Let us consider a few fuzzy homogeneous classes of objects, represented in terms of fuzzy object-oriented dynamic networks, which simultaneously have equivalent and nonequivalent subclasses. Let us suppose that the first fuzzy homogeneous class of objects HomeF ridge defines the fuzzy concept of a home fridge and has the following representation: HomeF ridge( p1 = (ref rigerator volume, (v ∈ T (r volume), str))/1, p2 = (f reezer volume, (v ∈ T (f volume), str))/1, p3 = (cee, (v ∈ Vcee , str))/1, p4 = (aec, (v ∈ Vaec , kW h))/1, p5 = (sizes, ((v1 ∈ Vheight , cm), (v2 ∈ Vwidth , cm), (v3 ∈ Vdepth , cm))/1, p6 = (compactness, (vf6 (HomeF ridge.sizes), v ∈ [0, 1]))/0.93, p7 = (color, (v ∈ Vcolor , str))/1, p8 = (weight, (v ∈ T (weight), str))/1, p9 = (noisiness, (v ∈ T (noisiness), str))/0.75, p10 = (price, (v ∈ Vprice , N+ ))/1, f1 = get crisp weight()/0.93, f2 = get f uzzy price(a, b, k)/0.87 )/0.96, refrigerator volume is a fuzzy quantitative property defined as a linguistic variable, which has the following term-set T (r volume) = {very small, small, medium, big, very big}, where fuzzy variables very small, small, medium, big, and very big defined over the interval of integer numbers U = [40, 425], which means the volume of the
Union of Fuzzy Homogeneous Classes of Objects
675
refrigerator in cm3 , and have the following interpretation: M (very small) = {40/1 + 50/0.95 + 60/0.85 + 70/0.7 + 80/0.65} cm3 , M (small) = {95/0.92 + 110/0.78 + 125/0.63 + 140/0.55 + 150/0.45} cm3 , M (medium) = {170/0.78 + 190/0.92 + 210/1 + 230/0.92 + 250/0.78} cm3 , M (big) = {270/0.82 + 290/0.94 + 310/1 + 330/0.94 + 350/0.82} cm3 , M (very big) = {365/0.65 + 380/0.72 + 395/0.86 + 410/0.93 + 425/1} cm3 ; freezer volume is a fuzzy quantitative property defined as a linguistic variable, which has the following term-set T (f volume) = {very small, small, medium, big, very big}, where fuzzy variables very small, small, medium, big, and very big defined over the interval of integer numbers U = [10, 275], which means the volume of the freezer in cm3 , and have the following interpretation: M (very small) = {10/1 + 17/0.94 + 24/0.85 + 31/0.78 + 38/0.65} cm3 , M (small) = {50/1 + 57/0.92 + 64/0.86 + 71/0.73 + 78/0.61} cm3 , M (medium) = {90/0.85 + 100/0.93 + 110/1 + 120/0.93 + 130/0.85} cm3 , M (big) = {140/0.82 + 155/0.93 + 170/1 + 185/0.93 + 200/0.82} cm3 , M (very big) = {215/0.67 + 230/0.79 + 245/0.88 + 260/0.95 + 275/1} cm3 ; cee is a crisp quantitative property, which means the class of energy efficiency to which the fridge belongs, and is defined over the following set of string values Vcee = {A+++ , A++ , A+ , A, B, C, D, F }; aec is a crisp quantitative property, which means the annual energy consumption by the fridge in kW h, and is defined over the following interval of integer numbers Vaec = [100, 360]; sizes is a crisp multiple-valued quantitative property, which means dimensions of the fridge in cm, and is defined over the intervals of integer numbers Vheight = [45, 205], Vwidth = [35, 95], Vdepth = [55, 85]; compactness is a fuzzy qualitative property defined by verification function vf6 (HomeF ridge) : HomeF ridge.sizes → [0, 1], where vf6 (HomeF ridge) is defined as follows vf6 (HomeF ridge) =
V − Vmin , Vmax − Vmin
where V = sizes.v1 · sizes.v2 · sizes.v3 , Vmin = min(Vheight ) · min(Vwidth ) · min(Vdepth ), Vmax = max(Vheight ) · max(Vwidth ) · max(Vdepth );
676
D. Terletskyi and S. Yershov
color is a crisp quantitative property, which means the color of the fridge, and is defined over the following set of string values Vcolor = {beige, white, graphite, golden, brown, red, stainless steel, silver, grey, titanium, black, bronze, blue, green, orange, pink, ivory, purple}; weight is a fuzzy quantitative property defined as a linguistic variable, which has the following term-set T (weight) = {lightweight, medium, heavy, very heavy}, where fuzzy variables lightweight, medium, heavy, and very heavy defined over the interval of integer numbers U = [10, 135], which means the weight of the fridge in kg, and have the following interpretation: M (lightweight) = {10/1 + 20/0.95 + 30/0.88 + 40/0.79 + 50/0.68} kg, M (medium) = {55/0.83 + 60/0.94 + 65/1 + 70/0.94 + 75/0.83} kg, M (heavy) = {80/0.86 + 85/0.95 + 90/1 + 95/0.95 + 100/0.86} kg, M (very heavy) = {115/0.71 + 120/0.79 + 125/0.88 + 130/0.96 + 135/1} kg; noisiness is a fuzzy quantitative property defined as a linguistic variable, which has the following term-set T (noisiness) = {low, medium, high}, where fuzzy variables low, medium, and high defined over the interval of real numbers U = [30, 45], which means the noisiness of the fridge in dB, and have the following meaning: M (low) = {30/1 + 31/0.97 + 32/0.93 + 33/0.89 + 34/0.82} dB, M (medium) = {35/0.92 + 36/0.98 + 37/1 + 38/0.98 + 39/0.92} dB, M (high) = {40/0.83 + 41/0.88 + 42/0.92 + 43/0.97 + 44/1} dB; price is a crisp quantitative property, which means the price of the fridge in UAH, and is defined over the interval of integer numbers Vprice = [2200, 255000]; get crisp weight is a fuzzy method that computes defuzzification representation of the fuzzy quantitative property weight and defined in the following way: |weight.v| μ(weight.v) · weight.v get crisp weight() = i=1|weight.v| ; μ(weight.v) i=1 get fuzzy price is a fuzzy method that computes fuzzification representation of the crisp quantitative property price and defined in the following way: − + + get f uzzy price(a, b, k) = {x− i /μ(xi ), price.v/1, xi /μ(xi )}, + where a < price.v < b and k is the incremental for the generation of x− i and xi , i = 1, . . .
x− i = price.v − k ∗ i, a < price.v − k ∗ i < price.v, x+ i = price.v + k ∗ i, price.v < price.v + k ∗ i < b,
Union of Fuzzy Homogeneous Classes of Objects
677
and where μ(x− i )= μ(x+ i )=
x− i −a price.v−a b−x+ i b−price.v
− − − − δi− , δi− = 1 − μ(x− i ) − ν(xi ), ν(xi ) = 1 − μ(xi ), + + + − δi+ , δi+ = 1 − μ(x+ i ) − ν(xi ), ν(xi ) = 1 − μ(xi ).
As we can see, the fuzzy class of objects HomeF ridge has a measure of its fuzziness, which is equal to 0.96 according to Definition 1. Now let us suppose that the second fuzzy homogeneous class of objects HotelF ridge defines the fuzzy concept of a hotel fridge and has the following representation: HotelF ridge( p1 = (ref rigerator volume, (v ∈ T (r volume), str))/1, p2 = (f reezer volume, (v ∈ T (f volume), str))/0.78, p3 = (cee, (v ∈ Vcee , str))/1, p4 = (aec, (v ∈ Vaec , kW h))/1, p5 = (sizes, ((v1 ∈ Vheight , cm), (v2 ∈ Vwidth , cm), (v3 ∈ Vdepth , cm))/1, p6 = (compactness, (vf6 (HotelF ridge.sizes), v ∈ [0, 1]))/0.93, p7 = (color, (v ∈ Vcolor , str))/1, p8 = (weight, (v ∈ T (weight), str))/1, p9 = (noisiness, (v ∈ T (noisiness), str))/0.82, p10 = (price, (v ∈ Vprice , N+ ))/1, f1 = get crisp weight()/0.93, f2 = get f uzzy price(a, b, k)/0.87 )/0.94, refrigerator volume is a fuzzy quantitative property defined as a linguistic variable, which has the following term-set T (r volume) = {small, medium, big}, where fuzzy variables small, medium, and big defined over the interval of integer numbers U = [30, 45], which means the volume of the refrigerator in cm3 , and have the following interpretation: M (small) = {30/1 + 31/0.95 + 32/0.91 + 33/0.87 + 34/0.82} cm3 , M (medium) = {35/0.9 + 36/0.96 + 37/1 + 38/0.96 + 39/0.9} cm3 , M (big) = {40/0.78 + 41/0.83 + 42/0.89 + 43/0.95 + 44/1} cm3 ; freezer volume is a fuzzy quantitative property defined as a linguistic variable, which has the following term-set T (f volume) = {small, medium, big},
678
D. Terletskyi and S. Yershov
where fuzzy variables small, medium, and big defined over the interval of real numbers U = [5, 6.5], which means the volume of the freezer in cm3 , and have the following interpretation: M (small) = {5.0/1 + 5.1/0.92 + 5.2/0.87 + 5.3/0.82 + 5.4/0.78} cm3 , M (medium) = {5.5/0.8 + 5.6/0.93 + 5.7/1 + 5.8/0.93 + 5.9/0.8} cm3 , M (big) = {6.0/0.82 + 6.1/0.88 + 6.2/0.93 + 6.3/0.97 + 6.4/1} cm3 ; cee is a crisp quantitative property, which means the class of energy efficiency to which the fridge belongs, and is defined over the following set of string values Vcee = {A+++ , A++ , A+ , A, B, C, D, F }; aec is a crisp quantitative property, which means the annual energy consumption by the fridge in kW h, and is defined over the following interval of integer numbers Vaec = [95, 110]; sizes is a crisp multiple-valued quantitative property, which means dimensions of the fridge in cm, and is defined over the intervals of integer numbers Vheight = [45, 55], Vwidth = [40, 60], Vdepth = [40, 50]; compactness is a fuzzy qualitative property defined by verification function vf6 (HotelF ridge) : HotelF ridge.sizes → [0, 1], where vf6 (HotelF ridge) is defined as follows vf6 (HotelF ridge) =
V − Vmin , Vmax − Vmin
where V = sizes.v1 · sizes.v2 · sizes.v3 , Vmin = min(Vheight ) · min(Vwidth ) · min(Vdepth ), Vmax = max(Vheight ) · max(Vwidth ) · max(Vdepth ); color is a crisp quantitative property, which means the color of the fridge, and is defined over the following set of string values Vcolor = {beige, white, graphite, golden, brown, red, stainless steel, silver, grey, titanium, black, bronze, blue, green, orange, pink, ivory, purple}; weight is a fuzzy quantitative property defined as a linguistic variable, which has the following term-set T (weight) = {lightweight, medium}, where fuzzy variables lightweight, and medium, defined over the interval of integer numbers U = [14, 23], which means the weight of the fridge in kg, and have the following interpretation: M (lightweight) = {14/1 + 15/0.95 + 16/0.91 + 17/0.87 + 18/0.81} kg, M (medium) = {19/0.96 + 20/1 + 21/0.95 + 22/0.91 + 23/0.87} kg;
Union of Fuzzy Homogeneous Classes of Objects
679
noisiness is a fuzzy quantitative property defined as a linguistic variable, which has the following term-set T (noisiness) = {low, medium}, where fuzzy variables low, and medium defined over the interval of real numbers U = [35, 42], which means the noisiness of the fridge in dB, and have the following meaning: M (low) = {35/1 + 36/0.94 + 37/0.89 + 38/0.83} dB, M (medium) = {39/0.95 + 40/1 + 41/0.95 + 42/0.93} dB; price is a crisp quantitative property, which means the price of the fridge in UAH, and is defined over the interval of integer numbers Vprice = [2800, 4700]; get crisp weight is a fuzzy method that computes defuzzification representation of the fuzzy quantitative property weight and defined in the following way: |weight.v| get crisp weight() =
μ(weight.v) · weight.v ; |weight.v| μ(weight.v) i=1
i=1
get fuzzy price is a fuzzy method that computes fuzzification representation of the crisp quantitative property price and defined in the following way: − + + get f uzzy price(a, b, k) = {x− i /μ(xi ), price.v/1, xi /μ(xi )}, + where a < price.v < b and k is the incremental for the generation of x− i and xi , i = 1, . . .
x− i = price.v − k ∗ i, a < price.v − k ∗ i < price.v, x+ i = price.v + k ∗ i, price.v < price.v + k ∗ i < b, and where μ(x− i )= μ(x+ i )=
x− i −a price.v−a b−x+ i b−price.v
− − − − δi− , δi− = 1 − μ(x− i ) − ν(xi ), ν(xi ) = 1 − μ(xi ), + + + − δi+ , δi+ = 1 − μ(x+ i ) − ν(xi ), ν(xi ) = 1 − μ(xi ).
As we can see, the fuzzy class of objects HotelF ridge has a measure of its fuzziness, which is equal to 0.94 according to Definition 1. Now let us compute the union of the fuzzy homogeneous classes of objects HomeF ridge/0.96 ∪ HotelF ridge/0.94, using Algorithm 1, which implements the concept of universal union exploiter (see Definition 4). Analyzing the specifications and signatures of the fuzzy classes, we can conclude that HomeF ridge.cee ≡ HotelF ridge.cee, HomeF ridge.compactness ≡ HotelF ridge.compactness, HomeF ridge.color ≡ HotelF ridge.color, HomeF ridge.get crisp weight() ≡ HotelF ridge.get crisp weight(), HomeF ridge.get f uzzy price(a, b, k) ≡ HotelF ridge.get f uzzy price(a, b, k).
680
D. Terletskyi and S. Yershov
Therefore, using Algorithm 1 we have constructed a new fuzzy single-core inhomogeneous class of objects, which has the following representation: HomeHotelF ridge/0.95( Core(HomeF ridge, HotelF ridge)( p1 = (cee, (v ∈ Vcee , str))/1, p2 = (compactness, (vf2 (HomeF ridge.sizes), v ∈ [0, 1]))/0.93, p3 = (color, (v ∈ Vcolor , str))/1, f1 = get crisp weight()/0.93, f2 = get f uzzy price(a, b, k)/0.87 ), pr1 (HomeF ridge)( p4 = (ref rigerator volume, (v ∈ T (r volume), str))/1, p5 = (f reezer volume, (v ∈ T (f volume), str))/1, p6 = (aec, (v ∈ Vaec , kW h))/1, p7 = (sizes, ((v1 ∈ Vheight , cm), (v2 ∈ Vwidth , cm), (v3 ∈ Vdepth , cm))/1, p8 = (weight, (v ∈ T (weight), str))/1, p9 = (noisiness, (v ∈ T (noisiness), str))/0.75, p10 = (price, (v ∈ Vprice , N+ ))/1 ), pr2 (HotelF ridge)( p4 = (ref rigerator volume, (v ∈ T (r volume), str))/1, p5 = (f reezer volume, (v ∈ T (f volume), str))/0.78, p6 = (aec, (v ∈ Vaec , kW h))/1, p7 = (sizes, ((v1 ∈ Vheight , cm), (v2 ∈ Vwidth , cm), (v3 ∈ Vdepth , cm))/1, p8 = (weight, (v ∈ T (weight), str))/1, p9 = (noisiness, (v ∈ T (noisiness), str))/0.82, p10 = (price, (v ∈ Vprice , N+ ))/1, ) )/0.95. As we can see, the created a new fuzzy single-core inhomogeneous class of objects HomeHotelF ridge has a measure of its fuzziness, which is equal to 0.95 according to Definition 2. It simultaneously defines both fuzzy types defined by fuzzy homogeneous classes of objects HomeF ridge/0.96 and HotelF ridge/0.94. The core Core(HomeF ridge, HotelF ridge) contains all properties and methods, which are common for both fuzzy types. The creation of the core allows avoiding representation redundancy of corresponding fuzzy types of objects because it stores all properties and methods which are common for both fuzzy types. Projections pr1 (HomeF ridge) and pr2 (HotelF ridge) consist of properties and
Union of Fuzzy Homogeneous Classes of Objects
681
methods, which are typical only for the corresponding fuzzy types. Using a fuzzy single-core inhomogeneous class of objects HomeHotelF ridge, we can define instances of each its fuzzy type, i.e. HomeF ridge or HotelF ridge. As the result, Algorithm 1 provides an opportunity to verify the similarity between two fuzzy homogeneous classes of objects HomeF ridge/0.96 and HotelF ridge/0.94 by the construction of the core, in addition, it allows verification of difference between these classes by the construction of the corresponding projections. Moreover, the newly constructed fuzzy single-core inhomogeneous class of objects HomeHotelF ridge/0.95 can define heterogeneous collection of objects, which belong to fuzzy type HomeF ridge/0.96 or HotelF ridge/0.94. Algorithm 1 can be used as a tool for the integration of new knowledge in a form of fuzzy homogeneous classes of objects into a knowledge base avoiding knowledge representation redundancies.
6
Conclusions
Efficient integration of newly extracted fuzzy knowledge into the knowledge base is an important task for modern knowledge-based systems. However to perform such a task a system should be able to analyze different relationships between newly extracted and previously obtained knowledge. Therefore, we introduced the concept of a fuzzy single-core inhomogeneous class of objects and a fuzzy type of objects. In addition, we defined the concept of universal homogeneous union exploiter for fuzzy classes of objects and developed a corresponding algorithm for its implementation. The developed algorithm allows verification of equivalence, inclusion, similarity, and the difference between the newly extracted and previously obtained knowledge in terms of fuzzy homogeneous classes of objects within such knowledge representation model as fuzzy object-oriented dynamic networks. Fuzzy single-core inhomogeneous classes, which can be constructed using Algorithm 1 allows avoiding the representation redundancy during the knowledge integration and can define heterogeneous collections of objects, which belong to different fuzzy types defined by the classes. The proposed approach can be extended by adaptation of universal union exploiter for fuzzy inhomogeneous classes of objects as well as for the union of fuzzy homogeneous and inhomogeneous classes of objects and development of the corresponding algorithms for their implementation.
References 1. Bai, L., Cao, X., Jia, W.: Uncertain spatiotemporal data modeling and algebraic operations based on XML. Earth Sci. Inf. 11(1), 109–127 (2017). https://doi.org/ 10.1007/s12145-017-0322-6 2. Bai, L., Zhu, L.: An algebra for fuzzy spatiotemporal data in XML. IEEE Access 7, 22914–22926 (2019). https://doi.org/10.1109/ACCESS.2019.2898228 3. Bareiss, R., Porter, B.W., Murray, K.S.: Supporting start-to-finish development of knowledge bases. Mach. Learn. 4(3–4), 259–283 (1989). https://doi.org/10.1007/ BF00130714
682
D. Terletskyi and S. Yershov
4. Berzal, F., Mar´ın, N., Pons, O., Vila, M.A.: Using classical object-oriented features to build a fuzzy O-O database system. In: Lee, J. (ed.) Software Engineering with Computational Intelligence, Studies in Fuzziness and Soft Computing, vol. 121, pp. 131–155. Springer (2003). https://doi.org/10.1007/978-3-540-36423-8 6 5. Berzal, F., Mar´ın, N., Pons, O., Vila, M.A.: Managing fuzziness on conventional object-oriented platforms. Int. J. Intell. Syst. 22(7), 781–803 (2007). https://doi. org/10.1002/int.20228 6. Booch, G., et al.: Object-Oriented Analysis and Design with Applications, 3rd edn. Object Technology Series. Addison-Wesley Professional, Boston (2007) 7. Bordogna, G., Pasi, G., Lucarella, D.: A fuzzy object-oriented data model for managing vague and uncertain information. Int. J. Intell. Syst. 14(7), 623–651 (1999) 8. Craig, I.D.: Object-Oriented Programming Languages: Interpretation. UTCS, Springer, London (2007). https://doi.org/10.1007/978-1-84628-774-9 9. Ma, Z., Zhang, F., Yan, L., Cheng, J.: Fuzzy Knowledge Management for the Semantic Web, Studies in Fuzziness and Soft Computing, vol. 306. Springer, Berlin (2014). https://doi.org/10.1007/978-3-642-39283-2 10. Ma, Z.M., Liu, J., Yan, L.: Fuzzy data modeling and algebraic operations in XML. Int. J. Intell. Syst. 25(9), 925–947 (2010). https://doi.org/10.1002/int.20424 11. Ma, Z.M., Mili, F.: Handling fuzzy information in extended possibility-based fuzzy relational databases. Int. J. Intell. Syst. 17(10), 925–942 (2002). https://doi.org/ 10.1002/int.10057 12. Ma, Z.M., Yan, L., Zhang, F.: Modeling fuzzy information in UML class diagrams and object-oriented database models. Fuzzy Sets Syst. 186(1), 26–46 (2012). https://doi.org/10.1016/j.fss.2011.06.015 13. Ma, Z.M., Zhang, W.J., Ma, W.Y.: Assessment of data redundancy in fuzzy relational databases based on semantic inclusion degree. Inform. Process. Lett. 72(1– 2), 25–29 (1999). https://doi.org/10.1016/S0020-0190(99)00124-6 14. Ma, Z.M., Zhang, W.J., Ma, W.Y.: Extending object-oriented databases for fuzzy information modeling. Inf. Syst. 29(5), 421–435 (2004). https://doi.org/10.1016/ S0306-4379(03)00038-3 15. Mar´ın, N., Pons, O., Vila, M.A.: Fuzzy types: a new concept of type for managing vague structures. Int. J. Intell. Syst. 15(11), 1061–1085 (2000) 16. Mar´ın, N., Pons, O., Vila, M.A.: A strategy for adding fuzzy types to an objectoriented database system. Int. J. Intell. Syst. 16(7), 863–880 (2001). https://doi. org/10.1002/int.1039 17. Murray, K.S.: Learning as knowledge integration. Ph.D. thesis, Faculty of the Graduate School, University of Texas at Austin, Austin, Texas, USA (May 1995) 18. Murray, K.S.: KI: a tool for knowledge integration. In: Proceedings of the 13th National Conference on Artificial Intelligence, AAAI 1996, Portland, Oregon, USA, pp. 835–842 (1996) 19. Murray, K.S., Porter, B.W.: Controlling search for the consequences of new information during knowledge integration. In: Proceedings of the 6th International Workshop on Machine Learning, New York, USA, pp. 290–295 (1989) 20. Murray, K.S., Porter, B.W.: Developing a tool for knowledge integration: initial results. Int. J. Man-Mach. Stud. 33(4), 373–383 (1990) 21. Ndousse, T.D.: Intelligent systems modeling with reusable fuzzy objects. Int. J. Intell. Syst. 12(2), 137–152 (1997) 22. Panigrahi, P.K., Goswami, A.: Algebra for fuzzy object oriented database language. Int. J. Comput. Appl. 26(1), 1–9 (2004). https://doi.org/10.1080/1206212X.2004. 11441721
Union of Fuzzy Homogeneous Classes of Objects
683
23. Pivert, O., Prade, H.: A certainty-based model for uncertain databases. IEEE Trans. Fuzzy Syst. 23(4), 1181–1196 (2015). https://doi.org/10.1109/TFUZZ. 2014.2347994 24. Terletskyi, D.: Object-oriented knowledge representation and data storage using inhomogeneous classes. In: Damaˇseviˇcius, R., Mikaˇsyt˙e, V. (eds.) ICIST 2017, CCIS, vol. 756, pp. 48–61. Springer, Cham (2017). https://doi.org/10.1007/978-3319-67642-5 5 25. Terletskyi, D.A., Provotar, A.I.: Fuzzy object-oriented dynamic networks. I. Cybern. Syst. Anal. 51(1), 34–40 (2015). https://doi.org/10.1007/s10559-0159694-0 26. Terletskyi, D.A., Provotar, A.I.: Fuzzy object-oriented dynamic networks. II. Cybern. Syst. Anal. 52(1), 38–45 (2016). https://doi.org/10.1007/s10559-0169797-2 27. Terletskyi, D.O.: Exploiters-based knowledge extraction in object-oriented knowledge representation. In: Suraj, Z., Czaja, L. (eds.) Proceedings 24th International Workshop, Concurrency, Specification and Programming, CS&P 2015, vol. 2, pp. 211–221. Rzeszow, Poland (2015) 28. Terletskyi, D.O.: Run-time class generation: algorithms for union of homogeneous and inhomogeneous classes. In: Damaˇseviˇcius, R., Vasiljevien˙e, G. (eds.) Information and Software Technologies. ICIST 2019, CCIS, vol. 1078, pp. 148–160. Springer (2019). https://doi.org/10.1007/978-3-030-30275-7 12 29. Terletskyi, D.O.: Run-time class generation: algorithms for decomposition of homogeneous classes. In: Lopata, A., Butkien˙e, R., Gudonien˙e, D., Sukack˙e, V. (eds.) Information and Software Technologies. ICIST 2020, CCIS, vol. 1283, pp. 243–254. Springer (2020). https://doi.org/10.1007/978-3-030-59506-7 20 30. Terletskyi, D.O., Provotar, O.I.: Algorithm for intersection of fuzzy homogeneous classes of objects. In: Proc. IEEE 2020 15th International Conference on Computer Sciences and Information Technology (CSIT), vol. 2, pp. 314–317. Zbarazh, Ukraine (2020). https://doi.org/10.1109/CSIT49958.2020.9321914 31. Terletskyi, D.O., Provotar, O.I.: Intersection of fuzzy homogeneous classes of objects. In: Shakhovska, N., Medykovskyy, M.O. (eds.) Advances in Intelligent Systems and Computing V, AISC, vol. 1293, pp. 306–323. Springer (2020). https:// doi.org/10.1007/978-3-030-63270-0 21 32. Thang, D.V.: Algebraic operations in fuzzy object-oriented databases based on hedge algebras. In: Cong Vinh, P., Ha Huy Cuong, N., Vassev, E. (ed.) ContextAware Systems and Applications, and Nature of Computation and Communication. ICTCC 2017, ICCASA 2017, LNICST, vol. 217, pp. 124–134. Springer (2018). https://doi.org/10.1007/978-3-319-77818-1 12 33. Yan, L., Ma, Z., Zhang, F.: Fuzzy XML data management, studies in fuzziness and soft computing, vol. 311. Springer, Berlin (2014). https://doi.org/10.1007/978-3642-44899-7 34. Yan, L., Ma, Z.M., Zhang, F.: Algebraic operations in fuzzy object-oriented databases. Inf. Syst. Front. 16, 543–556 (2014). https://doi.org/10.1007/s10796012-9359-8 35. Zadeh, L.A.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965). https://doi.org/10. 1016/S0019-9958(65)90241-X 36. Zadeh, L.A.: The concept of a linguistic variable and its application to approximate reasoning - I. Inform. Sci. 8(3), 199–249 (1975). https://doi.org/10.1016/00200255(75)90036-5
684
D. Terletskyi and S. Yershov
37. Zadeh, L.A.: The concept of a linguistic variable and its application to approximate reasoning - II. Inform. Sci. 8(4), 301–357 (1975). https://doi.org/10.1016/00200255(75)90046-8 38. Zadeh, L.A.: The concept of a linguistic variable and its application to approximate reasoning - III. Inform. Sci. 9(1), 43–80 (1975). https://doi.org/10.1016/ 0020-0255(75)90017-1 39. Zhang, L., Sun, J., Su, S., Liu, Q., Liu, J.: Uncertainty modeling of object-oriented biomedical information in HBase. IEEE Access 8, 51219–51229 (2020). https:// doi.org/10.1109/ACCESS.2020.2980553 40. Zvieli, A., Chen, P.P.: Entity - Relationship modeling and fuzzy databases. In: Proceedings of IEEE 2nd International Conference on Data Engineering, pp. 320– 327. Los Angeles, CA, USA (1986). https://doi.org/10.1109/ICDE.1986.7266236
Neuro-Fuzzy Diagnostics Systems Based on SGTM Neural-Like Structure and T-Controller Roman Tkachenko1 , Ivan Izonin1(B) , and Pavlo Tkachenko2 1
Lviv Polytechnic National University, Lviv, Ukraine 2 IT STEP University, Lviv, Ukraine
Abstract. Neuro-fuzzy models of management nowadays are becoming more widespread in various industries. Many papers deal with the synthesis of neuro-fuzzy models of diagnostics in economics, medicine, and industrial tasks. Most of them are based on iterative topologies of artificial neural networks and traditional fuzzy inference systems. The latter ones do not always ensure high accuracy, which affects the entire system that is being developed. This paper presents a new neuro-fuzzy diagnostic system based on non-iterative ANN and a new fuzzy model, a T-controller. The flowchart of the system proposed is given. All the operation stages of the system are described in detail, from the collection and preliminary processing of data to two different stages of diagnostics. The last stage can be performed manually or using a T-controller. Simulation of the system is conducted by means of a real set of data. The task was to predict the generator power based on a set of 13 independent variables. To improve the accuracy of the system performance, the Pad´e polynomial has been used, the coefficients of which are synthesized on the basis of a pre-trained SGTM neural-like structure using an optimization simulated annealing simulation. High accuracy of the system performance is shown using various performance indicators. The coefficients of the required polynomial are synthesized, which allows of fulfilling the task in a more understandable form. Keywords: SGTM neural-like structure · Non-iterative training Diagnostics · T-controller · Neuro-fuzzy system
1
·
Introduction
The development of the information society involves producing a large amount of diverse information. Effective data processing requires the use of various tools. This is due to many forms of incompleteness and uncertainty, which is typical of many application tasks. In terms of the missing data issue, there are many effective methods of regression modeling based on machine learning for data imputation [22]. As far as large amounts of data are concerned, the development of a modern neural network toolkit for data flow processing allows of fulfilling c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 685–695, 2022. https://doi.org/10.1007/978-3-030-82014-5_47
686
R. Tkachenko et al.
these tasks with high accuracy [5,6]. In case of ambiguity or inaccuracy of information, the tools of fuzzy logic should be used [4,10]. In particular, the use of linguistic variables makes it possible to take into account all the information available about the object of research when solving various application tasks. Nowadays, a tendency to combine the use of neural networks and fuzzy logic controllers, in particular in solving diagnostic problems, has been developing very rapidly [19]. This approach takes into account structured and poorly structured knowledge about the object of research during the analysis, which consequently increases the accuracy of solving applied problems through employing both artificial intelligence and the experience of the expert of a particular specific problem. Existing fuzzy systems are based on the use of iterative neural networks and traditional fuzzy inference systems [18], which imposes a number of restrictions on their application, in particular in terms of accuracy and speed of operation [14,15]. In [12], the principles of construction of fuzzy systems are considered, as well as the tools for their implementation, with the latest developments in this area outlined. The authors have identified a number of parallel trends for the development in this area, i.e. neuro-genetic systems and genetic fuzzy systems. Numerous works of the outstanding Ukrainian scientist in this field, prof. Bodyansky EV [7], focus on the principles of construction and application of neofuzzy-neurons and neo-neuro-fuzzy-systems, including the medical field. These systems show high operation efficiency in the online mode of information processing. In [11], a hybrid fuzzy system has been developed. The authors have employed feature reduction to reduce the learning time of the model, with the accuracy of its operation declining. In most other similar papers, the need to find a compromise between accuracy and speed of operation arises, which significantly affects the effectiveness of these systems. Hence, the objective of the chapter is to design neuro-fuzzy diagnostic systems that will provide high accuracy in solving applied diagnostic problems under the minimum time resources required to prepare it for use. The main contributions of this paper are the following: – it is designed new neuro-fuzzy diagnostics systems based on the non-iterative artificial neural network and author’s fuzzy inference systems - a T-controller; – Pad´e polynomial has been used to increase the forecasting accuracy; the synthesis of its coefficients involves using both SGTM neural-like structure and simulated annealing method. This approach implements the construction of similar polynomials from tabular data without limiting the number of variables; – one of the modes of operation of the system developed was tested when solving a real diagnostics problem, with high accuracy indicators being established on the basis of various performance indicators. The remainder of this paper is organized as follows: the second section contains a block diagram of the neuro-fuzzy diagnostic systems developed, as well as a detailed description of all its modules. The fourth section presents the simulation outcomes both on the basis of performance indicators and a scatter plot,
Neuro-Fuzzy Diagnostics Systems Based on SGTM
687
with the results obtained plotted against the reference values of the forecasting parameter. The Pade polynomial terms are synthesized, which allows of presenting the solution in an accessible form.
2
Proposed Neuro-Fuzzy Diagnostics System
This paper outlines the neuro-fuzzy diagnostic system devised by the authors. The flowchart of the system is presented in Fig. 1. Let us consider each block of the system (Fig. 1) in more detail. 1. Database. It contains log files of 5–10 min. 2. Data filter. It removes data vectors that correspond to non-operating windmill modes, wind speed values falling outside the limits. The filter parameters are set according to the characteristics of a particular windmill. 3. Numbering of data vectors. All data vectors passed through the filter are numbered consecutively in ascending order; these numbers are stored both by vectors at the output of the filter and by vectors to identify the moment of deviation, which have passed the extraction procedure of individual components according to the par. 4. 4. Vector formation. It is used to identify the moment of deviation. Vectors are constructed on basis of the ones obtained at the output of the filter; the structure of the separate i − th vector for identification is as follows xi,1 , ..., xi,m , zi,1 , ..., zi,k → 1, in which xi,1 , ..., xi,m are controlled parameters (controlled by this type of diagnosis); zi,1 , ..., zi,k are parameters that affect the controlled ones; In selecting, it should be considered that the fact that the controlled parameters depend on the influencing ones, with the latter, in general, do not depend on the former; some errors while selecting are possible, yet not critical for recognition. 5. Train data sample drive for ANN training. It is proposed to use a sample of about 20,000 data vectors, which spans a period of about 2 months of observations and provides the learning speed of the ANN developed in the second range; the sample is filled up according to the FIFO principle. 6. ANN training. The original ANN - SGTM neural-like structure is used, constructed on the principles of non-iterative high-speed learning, which make it possible to be periodically updated to take into account the possible seasonal climate change. The topologies of this artificial intelligence instrument are shown in Fig. 2. It contains only one hidden layer, with lateral connections between the neurons of this layer. Its peculiarity lies in changing easily from the neural network application form of the model to the more obvious and simple polynomial one. The procedure for this transition is given in [13]. Details of the high-speed learning algorithm and linear SGTM neural-like structure application are presented in [13,20].
688
R. Tkachenko et al.
Fig. 1. Flowchart of the neuro-fuzzy diagnostics system designed
Neuro-Fuzzy Diagnostics Systems Based on SGTM
689
Fig. 2. SGTM neural-like structure topologies: a) autoassociative mode; b) supervised mode
7. Application of a trained ANN. The ANN input receives a flux from the vector creator output to identify the deviation moment from the mode; for vectors corresponding to the normal modes, the ANN output assumes values close to 1, while for abnormal situations, the predicted values fall outside the range; the numbers of the first and last vectors falling outside the range of values are registered. In particular, an example of such an abnormal situation is shown in Fig. 3.
690
R. Tkachenko et al.
Fig. 3. Example of abnormal situation
8. Vector comparison block. It compares the first vector from the flux, the number of which is identified as a deviation from the normal mode according to par. 7, with the previous vector that is within the mode; the comparison is carried out against a set of components, compared with those used to identify the moment of deviation from the mode. An example of such a set of components for solving the problem is shown in Fig. 4. 9. Taking a decision on diagnosis. It can be either performed analytically in manual mode, or based on the use of T-controller Fuzzy Logic [21]. It should be noted that in Fig. 1, an asterisk indicates the second diagnostic mode. The T-controller use in comparison with existing algorithms (Mamdani, TakagiSugeno ones, etc.) is attributed to a number of advantages, i.e. [21] high accuracy, ease of use and configuration, the capability to tackle problems of increased dimensionality. 10. Base of production rules. It is the basis for the Fuzzy Logic T-controller use and it sets the principles for drawing conclusions as to the diagnosis based on the results of comparing vectors according to par. 8. It can be created on the basis of cooperation between users and developers of the diagnostic system. This paper demonstrates the possibility of using the system without using a T-controller. However, it is production rules that afford an opportunity to improve the performance accuracy of various diagnostic tasks.
Neuro-Fuzzy Diagnostics Systems Based on SGTM
691
Fig. 4. An example of a set of components to identify the moment of deviation from the regular mode
3
Modeling and Results
The simulation of the method is based on the data of a real task. It was necessary to devise a mathematical dependence model of the output parameter (the generator power) on a set of 13 input parameters in the form of a compact formula. Obviously, the system proposed allows of more parameters to be included, but this case revealed that the very 13-th one had a profound effect on the generator power. In [20] authors have considered the possibility of increasing the accuracy of an SGTM neural-like structure by using the Ito decomposition. In this case, the initial inputs of the task were replaced by the terms of the Itˆ o’s expansion, the coefficients of which were determined using the SGTM neural-like structure. In this paper, the dependence was constructed on the basis of a linear Pad´e polynomial: y = a0 + a1 x1 + a2 x2 + a3 x3 + a4 x4 + a5 x5 + a6 x6 + a7 x7 + a8 x8 + a9 x9 +a10 x10 + a11 x11 + a12 x12 + a13 x13 /(1 + b1 x1 + b2 x2 + b3 x3 + b4 x4 + b5 x5 +b6 x6 + b7 x7 + b8 x8 + b9 x9 + b10 x10 + b11 x11 + b12 x12 + b13 x13 )
This polynomial provides high forecast accuracy in the application mode. The literature on this subject indicates that Pad´e polynomials can be constructed for 2 variables provided analytical dependencies are known [16]. The system developed implements the construction of the polynomials for tabular data without restrictions to the number of variables. Some outliers, in this case, can be attributed to sensor failures. The accuracy of the developed neuro-fuzzy diagnostic systems is based on various performance indicators:
692
R. Tkachenko et al.
M AP E = 3.6633e − 001 M AD = 4.6555e + 000 M SE = 7.0546e + 001 R = 9.9989e − 001 SEA = 1.5768e + 004 It should be noted that the developed system provided a very high accuracy of operation, given that the value of the predicted parameter is in the range from 0 to 250. In addition, the developed system made it possible to obtain the coefficients of this polynomial, which provides a polynomial interpretation of the developed neural network implementation of the proposed system and its practical application when the presence of explainable AI is required. In order to obtain coefficients, as well as an SGTM neural-like structure, as in [13], an optimization procedure for simulated annealing was used. It provided clarification of the obtained coefficients, some of which are given in (3): a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 ...
= 32.017549635562347; = 0.0033773725991261567; = 0.46486250675291979; = −0.09680454617396915; = −0.00030595260806878333; = −0.147893334152995326; = 0.82800400824345455; = −1.3711332421479037; = 0.92001424665226816; = −0.00019569852439931069;
Visualization of the predicted results in comparison with the reference values is shown in Fig. 5.
Fig. 5. Visualization of the obtained results using Scatter Plot
Visual outcomes of comparison between the obtained and reference values also show high accuracy of the outcome. With input data and a specific task in
Neuro-Fuzzy Diagnostics Systems Based on SGTM
693
mind, the devised neuro-fuzzy diagnostic systems based on the SGTM neurallike structure and T-controller under minor modifications can be used in various fields [1–3,8,9,17].
4
Conclusions
Diagnostic tasks in various industries, economics, or medicine are of vital importance, given the opportunity to save material or human resources, as well as the ability to give an accurate diagnosis or avoid doing harm to human health. A neural network approach or application of machine learning algorithms reveals high out-come accuracy. However, they do not take into consideration the knowledge of experts, which is very difficult or even impossible to be expressed in numerical form. The tools of fuzzy logic make it possible to consider this factor. The combination of artificial neural networks and fuzzy logic brings a number of advantages over using these tools alone. Modern neural fuzzy systems are devised using iterative neural network topologies and traditional fuzzy inference systems, which makes them quite time-consuming and not always accurate. This paper proposes the use of non-iterative artificial neural networks and the author’s fuzzy system, by means of which authors have developed the diagnostics information technology. This approach improves the accuracy of these systems by minimizing the time resources for its training. To increase the accuracy of the system performance, the Pad´e polynomial is used, the coefficients of which were determined using a non-iterative neural network and a global optimization algorithm. The system outcomes of the system showed a high accuracy of its operation while dealing with the real task, granted that the value range of the decision variable was very broad. In addition, a polynomial form of the solution is presented by synthesizing coefficients to simplify the understanding of forming the system, in particular during its application. Further research will be conducted to pursue the following aims: – approbation of the system operation in the diagnostic mode using the Tcontroller to allow for poorly structured object data of research in order to improve the accuracy of the developed system in general; – modification of the developed system involving experts on a specific application area to fulfill various tasks in medicine.
References 1. Auzinger, W., Obelovska, K., Stolyarchuk, R.: A modified gomory-hu algorithm with DWDM-oriented technology. In: Large-Scale Scientific Computing, pp. 547– 554. Springer, Cham (2019) ˇ 2. Babichev, S., Skvor, J.: Technique of gene expression profiles extraction based on the complex use of clustering and classification methods. Diagnostics 10, 584 (2020). https://doi.org/10.3390/diagnostics10080584
694
R. Tkachenko et al.
3. Babichev, S., Durnyak, B., Zhydetskyy, V., Pikh, I., Senkivskyy, V.: Application of optics density-based clustering algorithm using inductive methods of complex system analysis. In: IEEE 2019 14th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2019 - Proceedings, pp. 169–172 (2019). https://doi.org/10.1109/STC-CSIT.2019.8929869 4. Berezsky, O., et al.: Fuzzy system for breast disease diagnosing based on image analysis. CEUR-WS.org. 2488, 69–83 (2019) 5. Bodyanskiy, Y., Pirus, A., Deineko, A.: Multilayer radial-basis function network and its learning. In: 2020 IEEE 15th International Conference on Computer Sciences and Information Technologies (CSIT), pp. 92–95 (2020) 6. Bodyanskiy, Y., Deineko, A.O., Kutsenko, Y.: On-line kernel clustering based on the general regression neural network and t Kohonen’s self-organizing map. Autom. Control Comput. Sci. 51, 55–62 (2017). https://doi.org/10.3103/ S0146411617010023 7. Bodyanskiy, Y., Antonenko, T.: Deep neo-fuzzy neural network and its accelerated learning. In: Proceedings of the 2020 IEEE Third International Conference on Data Stream Mining and Processing (DSMP), pp. 67–71. IEEE (2020) 8. Chukhrai, N., Koval, Z.: Essence and classification of assessment methods for marketing strategies’ efficiency of cost-oriented enterprises. Actual Probl. Econ. 145, 118–127 (2013) 9. Chumachenko, D., Chumachenko, T., Meniailov, I., Pyrohov, P., Kuzin, I., Rodyna, R.O.L.D.P.: Simulation and forecasting of the coronavirus disease (covid-19) propagation in Ukraine based on machine learning approach. In: Data Stream Mining and Processing, pp. 372–382. Springer, Cham (2020) 10. Chumachenko, D., Sokolov, O., Yakovlev, S.: Fuzzy recurrent mappings in multiagent simulation of population dynamics systems. Int. J. Comput. 19(2), 290–297 (2020). https://doi.org/10.47839/ijc.19.2.1773 11. Das, H., Naik, B., Behera, H.S.: A hybrid neuro-fuzzy and feature reduction model for classification. In: Advances in Fuzzy Systems, pp. 1–15. Hindawi (2020). https://doi.org/10.1155/2020/4152049 12. Getaneh, G., Tiruneha, A., Robinson, F., Vuppuluri, S.: Neuro-fuzzy systems in construction engineering and management research. Autom. Constr. 119 (2020). https://doi.org/10.1016/j.autcon.2020.103348 13. Izonin, I., Tkachenko, R., Kryvinska, N., Tkachenko, P., Greguˇs ml., M.: Multiple linear regression based on coefficients identification using non-iterative SGTM neural-like structure. In: Rojas, I., Joya, G., Catala, A. (eds.) Advances in Computational Intelligence, pp. 467–479. Springer International Publishing, Cham (2019) 14. Kotsovsky, V., Batyuk, A., Yurchenko, M.: New approaches in the learning of complex-valued neural networks. In: 2020 IEEE Third International Conference on Data Stream Mining and Processing (DSMP), pp. 50–54 (2020) 15. Kotsovsky, V., Geche, F., Batyuk, A.: On the computational complexity of learning bithreshold neural units and networks. In: Lecture Notes in Computational Intelligence and Decision Making, pp. 189–202. Springer, Cham (2019) 16. Liancun, Z., Xinxin, Z.: Modeling and Analysis of Modern Fluid Problems. Elsevier, Goong Chen edn. (2017). https://doi.org/10.1016/C2016-0-01480-8 17. Mochurad, L., Yatskiv, M.: Simulation of a human operator’s response to stressors under production conditions. CEUR-WS 2753, 156 (2020) 18. Subbotin, S.: The neuro-fuzzy network synthesis and simplification on precedents in problems of diagnosis and pattern recognition. Opt. Mem. Neural Netw. 22, 97 (2013). https://doi.org/10.3103/S1060992X13020082
Neuro-Fuzzy Diagnostics Systems Based on SGTM
695
19. Teslyuk, V., Kazarian, A., Kryvinska, N., Tsmots, I.: Optimal artificial neural network type selection method for usage in smart house systems. Sensors 21, 47 (2021). https://doi.org/10.3390/s21010047 20. Tkachenko, R., Izonin, I., Vitynskyi, P., Lotoshynska, N., Pavlyuk, O.: Development of the non-iterative supervised learning predictor based on the ito decomposition and SGTM neural-like structure for managing medical insurance costs. Data 3, 46 (2018). https://doi.org/10.3390/data3040046 21. Verbenko, I., Tkachenko, R.: Gantry and bridge cranes neuro-fuzzy control by using neural-like structures of geometric transformations. Czasopismo Techniczne 2013, 53 (2014). https://doi.org/10.4467/2353737XCT.14.057.3965 22. Wang, C., Shakhovska, N., Sachenko, A., Komar, M.A.: New approach for missing data imputation in big data interface. Inf. Technol. Control 49, 541–555 (2020). https://doi.org/10.5755/j01.itc.49.4.27386
An Integral Software Solution of the SGTM Neural-Like Structures Implementation for Solving Different Data Mining Tasks Roman Tkachenko(B) Lviv Polytechnic National University, Lviv, Ukraine
Abstract. The paper presents a developed software solution that implements a new learning model and application of artificial neural networks, i.e. the Successive Geometric Transformations Model, to solve various applied data mining tasks. This model is applied to construct Feed Forward Neural Networks, which anticipates the rejection of training goal interpretation as a multi-extremal optimization procedure. The modes of use of the system developed (prediction, time series, and cascade modes) are described in detail and illustrated, as well as the structure and functions of separate constituents of the graphical user interface; data selection and download procedures; neuro-like structure parameterization capabilities for each different mode of operation of the program. Simulation of the developed integral software solution was performed while fulfilling prediction, classification, and time series forecasting. Besides, the feasibility of performing nonlinear PCA analysis at high speed is shown. When performing all the above tasks, the program outcomes are presented in both numerical form and using the appropriate one for a particular mode of visualization in 2D or 3D formats. Keywords: SGTM neural-like structure · Non-iterative training Prediction · Classification · Time series · Nonlinear PCA
1
·
Introduction
Performing numerous applied data mining tasks in various industries requires quick and accurate solutions [6]. The application of neural network toolkit allows of processing both large and small data sets with significant nonlinearities in the middle of the set efficiently [17,20]. Despite the high accuracy that can be achieved by different topologies of artificial neural networks [2,5,16] these solutions are quite inefficient. This is due to the iterative nature of their training procedures [13]. In addition, the random initial initialization of weighting coefficients [21] or the choice of the cluster numbers [4] does not ensure the repeatability of the result. To eliminate these shortcomings, [22] presents a new model for constructing artificial neural networks – the Successive Geometric c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 696–713, 2022. https://doi.org/10.1007/978-3-030-82014-5_48
An Integral Software Solution of the SGTM Neural-Like Structures
697
Transformations Model (SGTM). Although similar non-iterative analogues exist [27], the model developed has a number of advantages, described in detail in [22]. The paradigm of the neural networks (Fig. 1, Fig. 2) based on the SGTM is represented as a functional of the table argument functions of neuron elements and synapses, the approximation of which is ensured during network training. The procedure can be reduced to an approximation of a table of two-variable function values, which consists of components of an object observation vectors, using a combination of one-variable functions. The approximation algorithm, which is described by a neuron network graph in the operational mode (Fig. 1), is comprised of sequential steps of orthogonal transformations.
Fig. 1. SGTM neural-like structure topology for supervised mode
Fig. 2. SGTM neural-like structure topology for autoassociative mode
The authors in [22] present two main topologies of this tool (Fig. 1 and Fig. 2) for solving various problems. A mathematical description of SGTM training procedures and a neural network parameterization procedure is given in [22]. The solution to the regression task has been analyzed, the high speed of operation of
698
R. Tkachenko
neural networks of this type when increasing their accuracy in comparison with the existing toolkit has been demonstrated. This direction was further developed in other numerous works. In particular, [11] deals with a method of increasing the resolution of images. Experimental studies have confirmed the effectiveness of SGTM in comparison with the DCNN. In [12], a method of image recognition and classification based on the SGTM has been proposed. Paper [18] presents methods of multidimensional data visualization based on the autoassociative SGTM. The same SGTM neural-like structure was used in solving the digital image watermarking task [25]. The SGTM neural-like structure is also used to build encryption systems and data transmission of increased cryptocurrency [24]. To increase the speed of operation in the application mode and illustrate this procedure, [10] proposes a transition diagram from the neural network form to the polynomial one of SGTM neural-like structure application. The authors have developed a technique for synthesizing linear polynomial coefficients provided both linear [10] and nonlinear [23] signals are sent to the inputs of this artificial intelligence means. The hardware implementation of this approach is presented in [9]. In [7,8] the solution to classification problems on the basis of the integral SGTMbased algorithm is described. The results obtained reveal a significant increase in the accuracy of the method proposed in comparison with the existing ones under minimum time resources required for the implementation of the training procedure. There are many other practical applications of the developed toolkit, but they do not offer an integral software solution that would allow of solving various data mining tasks using a single, simple, and user-friendly graphical interface. The main purpose of the chapter is to describe the designed software tool of the common SGTM neural-like structure as a universal and simple base for solving different data mining tasks. The main contributions of this paper are: – the integrated software tool is designed that is based on an author’s approach for the construction of Feed Forward Neural Networks which anticipates rejection of training goal interpretation as a multi-extremal optimization procedure; – an easy-to-handle and intuitive user interface has been devised that makes it possible to configure hyperparameters for efficient operation of the SGTM neural-like structure for different tasks; – the outcomes of fulfilling various data mining tasks are shown both in numerical and visual forms based on the developed software employment.
2
Materials and Methods
Designed software tool (ST) implements a new approach to the construction of Feed Forward Neural Networks, which anticipates rejection of training goal interpretation as a multi-extremal optimization procedure. Having presented modeling objects, predictions or classifications in form of bodies in a multidimensional
An Integral Software Solution of the SGTM Neural-Like Structures
699
realization space, we ensure their decomposition (training mode of the neural network) or composition from elements (operation mode of the neural network). Because of this, the main advantages of the SGTM neural-like structures are achieved, i.e.: non-iterative training, reliability, transparency for the user. Developed solution is a graphical user interface to easily and quickly get insight into data, and to model the underlying process described by the data. It can handle numerical and literal (nominal) data and solve prediction, classification, and time series task. With using this tool you can: – Create, change, save and load ST projects. Such projects consist of data for training and testing, a network topology, and a number of project properties; – Load and save formatted data (tab or comma separated values format), as essential parts of a project. Data files can either be referenced, or copied into a project. Data also can easily be exported, for use outside of control of a software tool’s project; – Create network topologies and save them as part of a project. Network topologies can also be exported and loaded, to be used in other projects. This also allows delivering already trained network topologies to third parties; – Train and use network topologies on data belonging to a project. A trained network topology is also called a model; – Visualize original, predicted, and principal component data with 2dimensional charts, explore principal components with a 3-dimensional scatter plot; – Estimate the fitness of a chosen model and data with a large number of builtin error calculations. The graphical interface distribution package contains a Java Runtime Environment, an executable program, manual and a set of demo projects, which are installed by means of the installation program 2.1
Interface of the Developed Software Tool
The graphical user interface of the designed tool consists of a number of areas, as shown in Fig. 3: a) Component for operation mode selection, and training and usage of the network; b) Spreadsheet component, to manage processed data; c) Controller component, for network parameters; d) Chart area, to visualize training/test output and principal components of a trained network. Output charts also display error measures, for estimating the fitness of a chosen model and the data. The principal component chart also displays the dispersion of each principle component, giving a measure for the overall contribution to the result; e) Status area, displaying information about the success or failure of the last performed user action, the project name, the training state of the current model, and the current working directory.
700
R. Tkachenko
Fig. 3. Parts of the designed graphical interface
In proposed tool, you are always working in the context of a ST’s project. A project consists of training and test data, a (trained or untrained) network topology, and the project properties. The name of the current project is displayed in the status line, as shown in Fig. 3e. 2.2
Contents of Project Directory
When saving a project into a directory, a number of files are saved in this directory. In the following, each of these files is explained in detail. – .fnp: This is the project file, a text file containing the project properties. The file starts with comment lines, starting with a hash symbol, #, and containing the time and date the project was save. In the following lines, there is one property per line. Each property has a name, and a value. The name can be structured by dot-notation, and the value follows the equality sign. There are the following properties: • dataset.0.importsource=: The location (directory and file name) where the data for training was loaded from dataset.0 indicates properties belonging to training data; • dataset.0.filename=: The name of the file where the training data is saved. This is either the name used by proposed software for files loaded as a copy (i.e. ¡projectname¿.train.orig.csv, or the name of a referenced file;
An Integral Software Solution of the SGTM Neural-Like Structures
701
• dataset.0.input=: A range list is a list of integer ranges, e.g. 1–100, 150, 201–200. Range lists are used whenever a range of rows or columns has to be specified. A range list can also be just a single number. In this property, the range for the input column(s) for training data is stored; • dataset.0.output=: The range list for the output column(s); • dataset.0.trainingset=: The range list for definition of training data rows; • dataset.1.importsource=: Same as dataset.0.importsource, but for test data. dataset.1 indicates properties belonging to test data; • dataset.1.filename=: Same as dataset.0.filename, but for test data; • dataset.1.input=: Same as dataset.0.input, but for test data; • dataset.1.output=: Same as dataset.0.output, but for test data; • dataset.1.testset=: Same as dataset.0.testset, but for test data; • parameterset.network=: The name of the topology file that is managed by the project. Topology files are named *.ftf, for “proposed solution topology file”. See further below in this section what the topology file contains; • parameterset.importsource=: The location (directory and file name) where a topology file was loaded from, if it was explicitly loaded. Normally, the topology file is completely managed by a programe project; – .ftf : This is the file containing a network topology, called project topology file. It is a binary file, thus it must not be changed manually. The topology file contains a network structure, including information about the number of input and output columns, number of neurons, non-linearity settings of neurons and synapses, and the number of layers. Furthermore, it contains information about the mode, either prediction or time series. In time series mode, the topology file also contains the sizes of the input and output windows. The topology is automatically created and held in memory, when you are working with the program. The topology can be either trained or untrained. This state is displayed in the upper right field inside the status area of the graphical user interface, and it is stored inside the topology file. – .train.orig.csv : This file contains the data used for training. It contains all data that was contained in the file used when loading training data. It is initially saved in the project directory when saving the project and the data file was loaded as a copy. After initial save, it is only saved when the original training data was changed, by shuffling it. The file format is tab separated values. – .train.pred.csv : Most importantly, this file contains the predicted values from training. It contains as many lines as file .train.orig.csv. However, only those lines are filled that where selected as training data rows, in the proposed graphical user interface. Furthermore, only those columns are contained that were selected either as input or output columns. For all other columns, a tab is inserted, but no value. The input columns contain the original input values, and the output columns contain the predicted data, for each training row. Each predicted value has a high number of decimal places.
702
R. Tkachenko
– .train.pc.csv : This files contains the principal component data that is produced for the training data set. The file contains as many lines as file .train.orig.csv. Only those lines are filled that where selected as training data rows, the other lines are empty (they contain empty tabs). There are as many columns as there are principal components, i.e. as many columns as there are neurons in the last layer of the last cascade of the used topology. The principal component values have a high number of decimal places. – .test.orig.csv : This file contains the data used for testing. It is initially saved in the project directory when saving the project and the data file was loaded as a copy. After initial save, it is only saved when the original test data was changed, by shuffling it. – .test.pred.csv : This file contains the predicted values from testing. The internal structure is the same as for .train.pred.csv, regarding filled and empty rows and columns. – .test.pc.csv : This files contains the principal component data that is produced for the test data rows. The internal structure is the same as for ¡projectname¿.train.pc.csv, regarding filled and empty rows and columns. For each data set under control of a project, you can specify if it should be saved on disk, or not. These settings are made in window Project Preferences, which is opened withProject-¿Project Preferences (Ctrl-R), as displayed in Fig. 4.
Fig. 4. Window edit project preferences.
There are six files that can be saved, three training data files and three test data files. For both training and test, there is one file containing the original data, one file containing predicted data, and one file containing principal component data. These files correspond to the contents of the data tables in the designed graphical user interface. 2.3
Data Selection
The data spreadsheet (Fig. 3b) consists of 2 tabs which contain training and test data. As the name suggests, on the training data the train operation will
An Integral Software Solution of the SGTM Neural-Like Structures
703
be performed, and on the test data, the test operation. The spreadsheet cells are colorized, the cells with blue background mark network inputs, the cells with yellow background mark network outputs. The font styles of cells mark data sets for use with the network: cells with bold style mark training set rows (Fig. 5), cells with bold italic style mark test set rows (Fig. 6).
Fig. 5. Train data spreadsheet annotation
Depending on operation mode of proposed graphical interface (Prediction or Time Series), input and output selection have different meanings: 1. In prediction mode, input columns mean network inputs, and output columns mean network outputs. 2. In time series mode, input columns mean “independent” columns - columns which content we a priori know, and output columns mean “dependent” columns, which contain time series data we try to predict and which depend on “independent” columns. Other elements of data spreadsheet are control elements, as depicted in Fig. 7: 1. Viewing mode selection. Toggles display of either original values, or predicted values, or principal component values. 2. Edit selection. Presents a dialog for directly entering the data selection values, without using the mouse context menu on the spreadsheet. 3. Selection transfer. Tries to copy the column and row selection from the foreground tab to the background tab.
704
R. Tkachenko
Fig. 6. Test data spreadsheet annotation
Fig. 7. Data spreadsheet controls
4. Shuffle rows. Performs a random shuffle of rows in the foreground and background tab. 5. Display principal components. Shows a window with principal component projections of your data. This operation is possible only on currently trained or used data. The selection of inputs and outputs, or the selection of training and test data sets can be performed either by selections in Edit Selection dialog or by direct mouse selections in the data spreadsheet, using the context menu Fig. 8.
An Integral Software Solution of the SGTM Neural-Like Structures
705
Fig. 8. “Edit Selection” dialog
While the use of Edit selection dialog is obvious, the use of context menu needs more explanation. The data spreadsheet provides two selection modes: row selection and column selection. 2.4
Network Parametrization
Parameterization fields for network configuration are grouped by their functional purpose (Fig. 9): 1. Parameters of sample space (network inputs and outputs, data pre- and postprocessing). 2. Parameters for the time series mode (dimensions of time windows). 3. Cascade parameters (setting up individual cascades of the network). Sample space is the space of the network input and output data. To characterize it, the following parameters should be specified, as shown in Fig. 9a: 1. Input and output filters of the network limited signal levels at the inputs and outputs of the network and are not compulsory. A value of 0 means there is no filtering. It is often helpful to increase the input filter coefficient when working with signal feeders where surges may appear. Input filtering can improve extrapolative properties, while - depending on the actual data decreasing interpolative properties (reproduction). An increase of the output filter coefficient decreases relative errors of prediction of large values at the outputs, increasing them for small values. Filtering is applied to the network as a whole, thus it is not set for each cascade individually. 2. Linear reduction of elements of the realization matrix by columns to the range [1; +1], also called normalization, is used to ensure equal influence on the results of individual inputs and outputs when there is no data pre-processing by the user.
706
R. Tkachenko
Fig. 9. Setup of network parameters: a) for sample space; b) for time series mode; c) for cascade mode
Prediction of time series is accomplished by using the network in recurrent mode, specifying corresponding sizes of input and output time windows while passing the first values for each attribute to inputs in each of the time steps. For operation, in the time series window mode the following parameters are specified, as shown in Fig. 9b: 1. “Input window” specifies size of the input time series window; 2. “Output window” specifies size of the output time series window. Usually, model building starts with one cascade. Inclusion of more than one cascade of networks assists in the serial linearization of the problem, as well as improvement of both reproductive and predictive properties. At the same time, the increase of the number of cascades introduces additional delays in the training and network usage processes, and increases memory usage. All cascades of the network are grouped in a corresponding register of the network setup dialog as shown in Fig. 9c. Each network cascade is set up independently of other cascades. Only one network cascade is set up at a time. This cascade will be referred to as “active”. Fields of the setup dialog have the following functions: 1. Shifting the active cascade forward by one position in the series of cascades. The network must contain at least two cascades to perform this operation. 2. Shifting the active cascade back by one position in the series of cascades. The network must contain at least two cascades to perform this operation. 3. Addition of a new network cascade with the initial values of the active cascade. The new cascade is inserted in the list of the cascades after the active cascade. 4. Removal of an active cascade from the list of network cascades. The network must contain at least two cascades to perform this operation. 5. Selecting active cascade.
An Integral Software Solution of the SGTM Neural-Like Structures
707
6. Setting non-linearity degree of neuron elements. Increasing the degree of nonlinearity of neuron elements assists in increasing network accuracy in the reproduction mode, however, starting at certain limits defined by data peculiarities, it causes generalizing properties to worsen. 7. Setting degree of non-linearity of synapses. Its influence and behavior correspond to the degree of non-linearity of neuron elements. 8. The number of network hidden layers sets the number of approximation sections of the reproduced surface, and also the level of data pre-processing on the previous layers respective to the last one. Excessive hidden layers can result in reduced accuracy. 9. The number of neurons in each of the hidden layers of the current cascade is one of the most important parameters of the network complexity. The recommended number of the hidden layer neurons can be found in [19]. For the case when the degree of non-linearity of neurons is higher than the same degree for synapses, it is allowed to use a higher number of neurons in the hidden layers than the number of inputs.
3
Simulation, Results and Discussion
Typically, when you want to process data with proposed software solution, you perform following steps: 1. 2. 3. 4. 5.
Load tab-separated data, which you want to process. Select training and test sets, input and output columns. Choose operation mode and select network parameters. Train and optionally test the network. Evaluate your results.
Outcome evaluation of the developed software is performed by means of both numerical performance indicators and visualization. Let us consider an example of the solution results of various data mining tasks using the graphical interface designed. 3.1
Prediction Task
Figure 10 shows the numerical prediction outcomes. The data plot can be displayed in three detail levels: Text only, Text+ Graphics and Text+Graphics+Legend. The graphics detail level can be selected in menu Options-¿Graphics Details.
708
R. Tkachenko
Fig. 10. Scatter plot (prediction mode)
3.2
Classification Task
Figure 11 shows the classification outcomes using a well-known Iris dataset. In classification mode, output values are displayed on the scale with categories available for this output. 3.3
Time Series Analysis Task
Figure 12 shows the time series forecasting outcomes. While working in time series mode instead of scatter plot a line plot will be displayed, as depicted in Fig. 12. 3.4
Principal Component Analysis
One of main features of designed program is the capability for (nonlinear) Principal Component Analysis. During training and usage of a network, you can also examine projections of data points on Principal Components of the generated hyper-body (the model) either in the data table or graphically, as two- and three-dimensional plots Fig. 13, Fig. 14 respectively). To avoid clutter while displaying large data sets, all plots provide zoom capabilities. The zoom level is defined by the number of items which can be displayed on the view without scrolling. The software developed can be used during various data mining tasks [1,3,14,15,19].
An Integral Software Solution of the SGTM Neural-Like Structures
Fig. 11. Scatter plot (classification mode)
Fig. 12. Line plot (time series mode)
709
710
R. Tkachenko
Fig. 13. Principal components (2D-View)
Fig. 14. Principal components (3D-View)
4
Conclusions
The paper presents an integral software solution that employs a high-speed SGTM neural-like structure to fulfill various data mining tasks in prediction and time series modes. The principles of operating an easy-to-handle and intuitive
An Integral Software Solution of the SGTM Neural-Like Structures
711
graphical user interface have been developed and described. All the parameterization functions of the neuro-like structure are presented in detail in three different operating modes, i.e. prediction, time series, and cascade modes, with the main differences being described. The simulation of the developed software for solving various data mining tasks has been carried out. Numerical and graphical outcomes of prediction, classification, and time series forecasting are given. In addition, the visualization of non-linear PCA outcomes in 2D and 3D forms is presented. Further research and application development to expand the functionality of the developed software will be conducted towards implementing fuzzy logic to build high-speed neuro-fuzzy systems for various purposes, which will be fulfilled by implementing the author’s fuzzy inference systems, i.e. the T-controller [26].
References 1. Auzinger, W., Obelovska, K., Stolyarchuk, R.: A modified gomory-hu algorithm with DWDM-oriented technology. In: Lirkov, I., Margenov, S. (eds.) Large-Scale Scientific Computing. LSSC 2019. Lecture Notes in Computer Science, vol. 11958, pp. 547–554. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-410322 63 2. Babichev, S., Durnyak, B., Zhydetskyy, V., Pikh, I., Senkivskyy, V.: Application of optics density-based clustering algorithm using inductive methods of complex system analysis. In: IEEE 2019 14th International Scientific and Technical Conference on Computer Sciences and Information Technologies, CSIT 2019 - Proceedings, pp. 169–172 (2019). https://doi.org/10.1109/STC-CSIT.2019.8929869 ˇ 3. Babichev, S., Skvor, J.: Technique of gene expression profiles extraction based on the complex use of clustering and classification methods. Diagnostics 10, 584 (2020). https://doi.org/10.3390/diagnostics10080584 4. Bodyanskiy, Y., Pirus, A., Deineko, A.: Multilayer radial-basis function network and its learning. In: 2020 IEEE 15th International Conference on Computer Sciences and Information Technologies (CSIT), pp. 92–95 (2020) 5. Bodyanskiy, Y., Vynokurova, O., Szymanski, Z., Kobylin, I., Kobylin, O.: Adaptive robust models for identification of nonstationary systems in data stream mining tasks. In: Presented at the Proceedings of the 2016 IEEE 1st International Conference on Data Stream Mining and Processing. DSMP 2016 (2016) 6. Bodyanskiy, Y.V., Tyshchenko, O.K., Kopaliani, D.S.: An evolving connectionist system for data stream fuzzy clustering and its online learning. Neurocomputing 262, 41–56 (2017). https://doi.org/10.1016/j.neucom.2017.03.081 7. Doroshenko, A.: Piecewise-linear approach to classification based on geometrical transformation model for imbalanced dataset. In: 2018 IEEE Second International Conference on Data Stream Mining Processing (DSMP), pp. 231–235 (2018) 8. Doroshenko, A.: Application of global optimization methods to increase the accuracy of classification in the data mining tasks. CEUR-WS.org. 2353, 98–109 (2019) 9. Ivan, T., Vasyl, T., Taras, T., Yurii, L.: The method and simulation model of element base selection for protection system synthesis and data transmission. Int. J. Sens. Wireless Commun. Control 10, 1–13 (2021)
712
R. Tkachenko
10. Izonin, I., Tkachenko, R., Kryvinska, N., Tkachenko, P., Greguˇs ml, M.: Multiple linear regression based on coefficients identification using non-iterative SGTM neural-like structure. In: Rojas, I., Joya, G., Catala, A. (eds.) Advances in Computational Intelligence. IWANN 2019. Lecture Notes in Computer Science, vol. 11506, pp. 467–479. Springer, Cham (2019). https://doi.org/10.1007/978-3-03020521-8 39 11. Izonin, I., Tkachenko, R., Peleshko, D., Rak, T., Batyuk, D.: Learning-based image super-resolution using weight coefficients of synaptic connections. In: 2015 Xth International Scientific and Technical Conference “Computer Sciences and Information Technologies”, vol. 2015, pp. 25–29 (2015). https://doi.org/10.1109/STCCSIT.2015.7325423 12. Khavalko, V., Tsmots, I.: Image classification and recognition on the base of autoassociative neural network usage. In: 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON), pp. 1118–1121 (2019) 13. Khoshgoftaar, T.M., Gao, K., Napolitano, A., Wald, R.A.: Comparative study of iterative and non-iterative feature selection techniques for software defect prediction. Inf. Syst. Front. 16, 801–822 (2014). https://doi.org/10.1007/s10796-0139430-0 14. Kotsovsky, V., Batyuk, A., Yurchenko, M.: New approaches in the learning of complex-valued neural networks. In: 2020 IEEE Third International Conference on Data Stream Mining Processing (DSMP), pp. 50–54 (2020) 15. Kotsovsky, V., Geche, F., Batyuk, A.: Finite generalization of the offline spectral learning. In: 2018 IEEE Second International Conference on Data Stream Mining Processing (DSMP), pp. 356–360 (2018) 16. Leoshchenko, S., Oliinyk, A., Subbotin, S., Shylo, S., Shkarupylo, V.: Method of artificial neural network synthesis for using in integrated cad. In: 2019 IEEE 15th International Conference on the Experience of Designing and Application of CAD Systems (CADSM), pp. 1–6 (2019) 17. Lytvynenko, V., et al.: Hybrid methods of gmdh-neural networks synthesis and training for solving problems of time series forecasting. In: Lytvynenko, V., Babichev, S., W´ ojcik, W., Vynokurova, O., Vyshemyrskaya, S., Radetskaya, S. (eds.) Lecture Notes in Computational Intelligence and Decision Making. ISDMCI 2019. Advances in Intelligent Systems and Computing, vol. 1020, pp. 513–531. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-26474-1 36 18. Medykovskyy, M., Tsmots, I., Tsymbal, Y., Doroshenko, A.: Development of a regional energy efficiency control system on the basis of intelligent components. In: 2016 XIth International Scientific and Technical Conference Computer Sciences and Information Technologies (CSIT), vol. 2016, pp. 18–20 (2016). https://doi. org/10.1109/STC-CSIT.2016.7589858 19. Murzenko, O., et al.: Application of a combined approach for predicting a peptideprotein binding affinity using regulatory regression methods with advance reduction of features. In: 2019 10th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), pp. 431–435 (2019) 20. Subbotin, S.: The neuro-fuzzy network synthesis and simplification on precedents in problems of diagnosis and pattern recognition. Opt. Mem. Neural Netw. 22, 97–103 (2013). https://doi.org/10.3103/S1060992X13020082 21. Teslyuk, V., Kazarian, A., Kryvinska, N., Tsmots, I.: Optimal artificial neural network type selection method for usage in smart house systems. Sensors 21, 47 (2021). https://doi.org/10.3390/s21010047
An Integral Software Solution of the SGTM Neural-Like Structures
713
22. Tkachenko, R., Izonin, I.: Model and and principles for the implementation of neural-like structures based on geometric data transformations. In: Hu, Z., Petoukhov, S., Dychka, I., He, M. (eds.) Advances in Computer Science for Engineering and Education. ICCSEEA 2018. Advances in Intelligent Systems and Computing, vol. 754, pp. 578–587. Springer, Cham (2019). https://doi.org/10.1007/9783-319-91008-6 58 23. Tkachenko, R., Izonin, I., Vitynskyi, P., Lotoshynska, N., Pavlyuk, O.: Development of the non-iterative supervised learning predictor based on the ito decomposition and SGTM neural-like structure for managing medical insurance costs. Data 3, 46 (2018). https://doi.org/10.3390/data3040046 24. Tsmots, I., Tsymbal, Y., Khavalko, V., Skorokhoda, O., Tesluyk, T.: Neural-like means for data streams encryption and decryption in real time. In: 2018 IEEE Second International Conference on Data Stream Mining and Processing, pp. 438– 443. IEEE, Lviv (2018) 25. Tsymbal, Y., Tkachenko, R.A.: Digital watermarking scheme based on autoassociative neural networks of the geometric transformations model. In: 2016 IEEE First International Conference on Data Stream Mining Processing (DSMP), pp. 231–234 (2016) 26. Verbenko, I., Tkachenko, R.: Gantry and and bridge cranes neuro-fuzzy control by using neural-like structures of geometric transformations. Czasopismo Techniczne 2013, 53–68 (2014) 27. Wang, X., Cao, W.: Non-iterative approaches in training feed-forward neural networks and their applications. Soft Comput. 22, 3473–3476 (2018). https://doi.org/ 10.1007/s00500-018-3203-0
An Expert System Prototype for the Early Diagnosis of Pneumonia Mariia Voronenko1(B) , Olena Kovalchuk2 , Luidmyla Lytvynenko1 , Svitlana Vyshemyrska1 , and Iurii Krak3 1
2
Kherson National Technical University, Kherson, Ukraine mary [email protected], [email protected], [email protected] National Pirogov Memorial Medical University, Vinnytsya, Ukraine [email protected] 3 Taras Shevchenko National University of Kyiv, Kyiv, Ukraine [email protected]
Abstract. In this paper, an expert system model prototype based on Bayesian networks is proposed, which makes it possible to provide assistance to a doctor in the early diagnosis of a disease such as “pneumonia”. We designed a static Bayesian network with five key variables to obtain the probabilistic inference of the resulting node that determines the presence or absence of disease in a patient. We consulted with medical experts when selecting and quantifying input and output variables. When constructing a Bayesian model and conducting scenario analysis for a better prognosis of the diagnosis, we consulted with the attending physicians. Keywords: Pneumonia · Expert system · Diagnostic methods Antibiotic therapy · Bayesian networks · Parametric learning · Validation
1
·
Introduction
The term diagnosis in medicine means the process of recognizing a disease, leading to a final diagnosis. It is necessary to distinguish between the methods and methodology of the diagnostic process. Diagnostic methods are understood as any technical technique with the help of which any sign of a pathological process or disease is established (detected). The methodology determines the order of application of these methods, the methods of analyzing the signs obtained with their help, and includes a number of provisions and rules imposed on the nature and direction of the doctor’s thinking. Each of the research methods allows, as it were, revealing some of the information about a particular characteristic of the pathological process. Only by bringing together and summarizing the data of all research methods, the doctor will be able to diagnose the disease itself. Modern diagnostics can only be comprehensive [3]. c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 714–728, 2022. https://doi.org/10.1007/978-3-030-82014-5_49
An Expert System Prototype for the Early Diagnosis of Pneumonia
715
From the diagnostic methodology point of view, it is fundamentally need to conclude that various methods of patient research are not appointed by the doctor “to confirm the diagnosis”, but are carried out by him consciously in order to detect some important side of the pathological process. Pneumonia is an inflammation of the lung tissue, usually of an infectious origin, with a predominant alveoli lesion. The term “pneumonia” unites a large group of diseases, each of which has its own etiology, pathogenesis, clinical description, X-ray signs, laboratory studies characteristic data, and features of therapeutic treatment. In general, pneumonia can have microbial (bacteria, viruses, protozoa), toxic, allergic, autoimmune, burn, radiation etiology [18]. The main diagnostic methods are lungs X-ray examination and sputum examination, the main method of treatment is antibiotic therapy. Late diagnosis and delay in starting antibiotic therapy worsen the disease prognosis. In some cases, death is possible. More than 17 million people are diagnosed with pneumonia annually, and men are 30% more likely to get sick than women. Special risk groups include children under 5 years old and elderly people over the age of 65 years. Every 64th person sick with this dangerous disease dies from pneumonia. At the same time, it is worth noting a rather high mortality rate relative to other diseases: 8.04% in men and 9.07% in women [21]. Pneumonia is the cause of death in 15% of children fewer than 5 years of age worldwide [28]. 808 694 children fewer than 5 years old died of pneumonia in 2017 [29]. Diagnostic methods of research are divided into basic (chest x-ray, microscopic examination of sputum with Gram stain (Gram), sputum culture on culture media, general and biochemical blood tests, blood gas analysis) and additional (computed tomography of the chest, paracentesis of the pleural cavity and pleural biopsy, bronchoscopy with biopsy, blood culture on culture media, detection of specific antibodies, lung biopsy and urinalysis). One of our tasks in this research is to develop a methodology for constructing a Bayesian network in the differential diagnosis of a disease such as pneumonia. The paper consists of the following sections. Section 2 defines the tasks that we will solve in our study. Section 3 provides an overview of the literature on existing methods for the differential diagnosis. In Sect. 4 we discuss the general formulation of the solution to our task. In Sect. 5, we show the input data and methods for obtaining structural model indicators. After that, in the Sect. 6 we describe the sequence of building and validating a Bayesian network. We present the research process and its results. Section 7 presents the study results analysis. Section 8 summarizes and completes.
2
Research Objective
Based on the study of the development of pneumonia in patients, the following predictors of pneumonia were identified: – presence of allergies, – cardiac ischemia,
716
– – – – –
M. Voronenko et al.
sex, smoking, bronchitis, harmful working conditions, age.
The possibility of developing pneumonia was indicated by the presence of some obvious symptoms in the patient: – – – – – –
hard breathing, rapid pulse, abnormalities in the radiograph, ECG deviation, swelling of the limbs, dry wheezing.
An important role in the final diagnosis of pneumonia is played by the results of clinical tests for: – – – – –
erythrocytes, hemoglobin, eosinophils, monocytes, erythrocyte sedimentation rate (ESR).
We will: 1. develop the static Bayesian model structure for the pneumonia diagnosis; 2. conduct parameter learning and validation of the Bayesian model to check the functional blocks interaction; 3. conduct a clinical cases scenario analysis in order to predict the condition of each patient separately; 4. analyze the scope of diagnostic system application. The solution to the problem of making a diagnosis is possible on the expert assessments basis in the statistical data presence. Our task is to assist in data processing, form objective alternatives, and choose the best alternative using static Bayesian network models. The resulting Bayesian system has a flexible structure, which implies the new nodes introduction to generate probabilistic inference.
3
Related Works
Bayesian networks are promising probabilistic tool for modeling complex hierarchical processes (static and dynamic) with arbitrary uncertainties. Bayes’ theorem is widely used in decision support systems, biometrics, evaluation and communication theory, and many other fields of science and technology, it has provided scientists with a solid foundation for representing [25].
An Expert System Prototype for the Early Diagnosis of Pneumonia
717
Since Bayesian networks allow identifying the basis of variability depending on various functions, they can be applied in two ways [10]. The first way is to assess the understanding of the functioning of the systems under study, the second way is to use Bayesian networks when evaluating the variables represented by the nodes. In the first case, the focus is on the links of Bayesian trust networks and refers to the functional relationships in the “rules” used to construct conditional probabilities for a node, and refers to mechanisms for describing the interaction of factors in determining the values of variables. In the second case, the focus is on evaluating, validating the model, and providing empirical information that is quantitative, useful, and relevant to key variables. Bayesian networks are widely used in environmental science and are an effective tool for structuring environmental studies [22,23]. Static BN have found their application in medicine [9], genetics [20], and epidemiology [16]. The scope of dynamic BN application is also very wide [8, 15,17,24]. Bayesian approach works even with incomplete and inaccurate input data, with noise and missing values. In such situation, the resulting value will correspond to the most likely outcome.
4
Problem Statement
Using Bayesian Approach the developed model will facilitate a faster diagnosis and thus help to pre-select the correct treatment method. This mathematical tool let us to describe the notions, which the patient uses when describing his health [29]. The conceptual model for diagnosing pneumonia is shown in Fig. 1. that are related, and a set of learning data For a set of events X i , i = 1, K, N (1) (2) (N ) is given. Here the subscript is the obserD = (d1 , K, dn ), di = xi xi Kxi vation amount, and the upper one is the variable amount, n – is the amount of observations, each observationconsists of N (N ≥2) variables, and each j-th variable (j = 1, K, N) has A(j) = 0, 1, K, α(j) − 1 α(j) ≥ 2 conditions. Based on a given training sample, you need to build an acyclic graph connecting the event sets Xi , i = 1, . . . , N. In addition, each BN structure g ∈ G is represented by a set N of predecessors P (1) , K, P (N ) , that is, for each vertex j = 1, K, N, P(j) it is a variety of parent vertices, such that P (j) ⊆ X (1) , K, X (N ) | X (j) . The aim of this study is to develop a Bayesian-based system for the early diagnosis of pneumonia.
5 5.1
Materials and Methods Data
Patients observation with the preliminary diagnosis possibility of pneumonia included [13,19]:
718
M. Voronenko et al.
Fig. 1. Conceptual model for diagnosing pneumonia
– taking clinical tests at the onset of the disease and after 7–10 days from the onset of the disease (that is, over time); – monitoring changes in patients’ symptoms, including the complaints with which he applied and the symptoms identified by the doctor; – data collection on the patient’s history; – taking biochemical blood tests; – data collection on living and working conditions. The study involved 136 people of different ages, different anamnesis, and varying disease severity. We selected a data sample for 16 clinical case and dissected each of them. All patients underwent the necessary examinations and each of them underwent a survey to establish anamnesis. If necessary, some patients have been adjusted drug treatment in accordance with accepted standards. The purpose of modeling is to confirm with a certain degree of probability or refute the diagnosis prognosis of “pneumonia” in patients from the study group. Table 1 contains a description of selected indicators, and indicators that are key are highlighted in bold. 5.2
Bayesian Network Methods
Bayesian network (or Bayesian network, Bayesian network of trust) is a graph model, which is a set of variables and their probabilistic Bayesian dependencies. For example, a Bayesian network can be used to define the likelihood that a patient is sick by the presence or absence of a number of symptoms, based on data on the relationship between symptoms and diseases.
An Expert System Prototype for the Early Diagnosis of Pneumonia
719
The mathematical apparatus of Bayesian networks was developed by the American scientist Jude Pearl in 1988 [2]. Bayesian networks are used if we are dealing with the calculation of the probability of a hypothesis being true under conditions when, based on observations, only some partial information about the events is known. Observed events cannot be unambiguously described as direct consequences of strictly determined causes. And therefore, in practice, a probabilistic description of phenomena is widely used. There are several reasons for this: the presence of fatal errors in the process of experimentation and observation; the impossibility of a complete description of the structural difficulties of the system under study; uncertainties due to the finiteness of the observation volume. On the way of probabilistic modeling there are certain difficulties that can be divided into two groups: – technical (computational complexity); – ideological (the presence of uncertainty, difficulty in setting the problem in terms of probabilities, insufficiency of statistical material). Let G = (V, Bi ) be a graph in which the ending V is a set of variables; Bi is nonreflexive binary relation on [2,5]. Each variable v has a set of parent variables c(v) ⊆ V and a set of all descendants d(v) ⊆ V . A set s(v) is a set of child variables for a variable v and s(v) is a subset of d(v). Let’s also mark, that: (a (v) ⊆ V ) = V − (d (v) ∪ {v})
(1)
that is, a(v) is a set of propositional variables from the set V , excluding the variable v and its descendants. The set of variables B, is the contexture of arguments defining the network. It composed of parameters Qxi |pa(X i ) = P xi | pa(X i ) for each possible xi value from X i and pa(X i ) from P a(X i ), where P a(X i ) means the variable X i parents set in G. Each variable X i in graph G is proposed as a apex. If we examine more than one graph, then we use the notation to recognize the parents P aG (X i ) in graph G [1,4,7]. The BN’s total probability B is specified by the formula: N PB X 1 , . . . , X N = PB X i | P a X i .
(2)
i=1
The purpose of the parametric learning procedure is to discover the most likely θ variables that interpret the data [6,27]. Let D = {D1 , D2 , . . . , DN } be learning data, where D1 = {x1 [l], x2 [l], . . . , xn [l]} occupies the post of Bayesian network nodes instances. The learning parameter is quantified by a log-likelihood function, denoted as LD (θ). Validation of the developed network was carried out according to the algorithm EM-expectation maximization, which was proposed for the first time in 1977 in [5]. The method of maximum expectation EM is a procedure of iterations, which is designed to solve optimization problems of some functionality, using an analytical search for the objective function extremes. This method is divided into two
720
M. Voronenko et al. Table 1. Initial data for the study
Designation of the node
Description
X1
Complaints
State s0 - means no complaints from the patient; s1 - means the patient has complaints
X11–X15 X2
Anamnesis
s0 means the absence of information from the anamnesis confirming the patient’s tendency to develop pneumonia; s1 means the presence of information from the anamnesis confirming the tendency to develop pneumonia in the patient
X21–X26 X27
Age
s0 corresponds to the age of up to 45 years; s1 corresponds to the age from 45 years old and above
X3
Symptoms
s0 means the absence of symptoms indicating pneumonia in the patient’s clinical picture; s1 is symptoms indicating pneumonia present in the patient’s body
X31 X32
Rapid pulse
s0 corresponds to a pulse of up to 70 beats per minute; s1 corresponds to a heart rate of 71 beats per minute and above
X33–X36 X4
Clinical
s0 means no deflections in the clinical blood test; s1 means the presence of deflections in the clinical blood test
X41
Erythrocytes
s0 means that the content of red blood cells in the blood is up to 4.6; s1 means that the content of red blood cells in the blood is from 4.65 and above
X42
Hemoglobin
s0 means that the hemoglobin level does not exceed 120; s1 means that the hemoglobin level is from 121 and above
X43
Eosinophils
s0 means that the content of leukocytes in the blood is up to 9; s1 means that the content of leukocytes in the blood is from 9.1 and above
X44, X45 X46
ESR
s0 means that the ESR level is up to 10; s1 means that the ESR level is from 11 and above
X5
Biochemical s0 means no deflections in the Biochemical blood test;
X51
Bilirubin
s1 means the presence of deflections in the Biochemical blood test s0 means that the level of bilirubin in the blood is up to 15; s1 means that the level of bilirubin in the blood is from 16 and above X52, X53 X54
Glucose
s0 means that the blood glucose level is up to 4.6; s1 means a blood glucose level of 4.7 or higher
X55 Y
Pneumonia yes/no?
s0 means the likelihood of a patient not being diagnosed with pneumonia; s1 means the likelihood of confirming the diagnosis of pneumonia in a patient
An Expert System Prototype for the Early Diagnosis of Pneumonia
721
stages. At the first stage of “expectation” (E - expectation) on the basis of available observations (patients) the expected values for each incomplete observation are calculated. After receiving the filled data set, the basic statistical parameters are estimated. In the second stage, “maximization” (M - maximization) maximizes the degree of compliance of the expected and actually substituted data [11,12,14,26,30]. The sensitivity measures how much the target node’s assumptions will change if a result is entered at another node. Sensitivity analysis does not indicate the direction of this relationship.
6
Experiments and Results
We simulated the Bayesian network model in the software environment GeNIe 2.3 Academic version. The Bayesian model is built by hand without using any structured learning algorithm. In this software environment, the structural model of a static network looks like the one shown in Fig. 2. The proposed Bayesian network is not limited to unidirectional action and can be used both for detecting causal signs and for predicting consequences. The analysis of information flows (messages) that take place between network nodes is performed. The arrows indicate the flow of information between the selected blocks. All X indicators are associated with the resulting node Y , there is also a relationship between the X indicators. The Bayesian formula is used in Bayesian networks as an inference tool to find a solution. If the Bayesian network is used to recognize (identify) objects, then many factors are replaced by factors or characteristics of a particular object. Selecting a set of instantiated variables separately has its advantages and disadvantages. The advantage of this representation is that it prevents looping when forming the output. If the output is not selected separately, there is a risk that the messages will affect each other and the network will become unstable. The disadvantage of this representation is that computing costs increase. However, the advantages are so great that the allocation of instantiated variables is completely justified. The final decision to confirm the effect between pollution data and test results, as well as the appointment of treatment, is made by the doctor. During the sensitivity analysis we remove the nodes X21, X25, X36, X44 together with the connections, because they do not affect anything and are not sensitive. The left side of Fig. 3 shows the Bayesian network at the time of the sensitivity analysis (red circles indicate nodes that have been removed), and the right side of Fig. 3 shows the resulting Bayesian network after removing the insensitive nodes and links. We will conduct parameter’s learning and repeated validation. In Table 2, we present the results of modeling each specific clinical case from a sample of data for 16 patients. The table shows the numerical values of all indicators and below the predicting result of confirming or refuting an early diagnosis of pneumonia.
722
M. Voronenko et al.
Fig. 2. The static Bayesian network structural model presented in the software environment GeNIe 2.3 Academic version
Fig. 3. a) The developed model at the time of the sensitivity analysis b) the corrected model after removing insensitive links and nodes.
An Expert System Prototype for the Early Diagnosis of Pneumonia
723
Table 2. The results of scenario analysis Indicators
Clinical case 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
0
1
1
1
1
1
1
1
0
1
1
1
1
1
0
0
0
0
0
0
0
0
1
1
0
1
0
1
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
X14
1
0
0
1
1
1
1
1
1
0
1
1
1
1
0
1
X15
1
1
1
0
0
1
1
1
1
1
1
0
1
0
1
1
1 0
0 0
1 1
1 1
1 1
1 1
1 0
1 0
1 0
1 0
1 0
1 0
1 0
1 1
1 0
1 0
X23
1
0
0
0
0
0
0
0
0
1
1
0
0
0
0
0
X24
1
0
0
1
0
1
1
1
1
1
1
1
1
1
1
0
X26
0
0
1
1
0
0
1
0
1
1
0
0
0
0
0
0
X27
79
76
72
54
81
55
54
56
52
53
55
40
74
71
77
46
1 0
0 0
1 0
0 0
1 0
1 1
1 1
1 1
1 1
1 0
0 0
0 0
1 0
0 0
1 0
1 0
X1 X11 X12 X13
X2 X22
X3 X31
Complaints presented by the patient
Patient history
Physicianidentified symptoms
16
92
85
150 90
102 90
90
90
90
90
92
88
90
92
90
88
X33
1
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
X34
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
X35
0
0
1
0
1
0
0
0
0
0
0
0
1
0
0
1
1 4,4
1 1 4,3 4
1 4,6
1 4
0 4,5
1 1 1 3,7 4,3 4,3
0 4,7
1 4,8
1
0 0 0 4,23 4,37 4,2
X32
X4 X41
Clinical blood test
0 4,7
X42
142 134 121 141 155 158 107 130 137 171 151 5
X43
11,6 7,5 5
6,5
8,8 5,9
6
9,5 4,8
5,7
14,5 160 5,2
10
6,1
X45
4
3
3
3
6
8
4
2
3
4
10
8,9 3
2
5
3
X46
11
21
28
19
9
4
16
14
17
4
9
8
7
9
3
1 16
0 12
1 11
1 1 10,4 12
1 0 13,7 13
0 0
1 0 1 11 11,7 11,6 11,6 0
0 0 12,4 0
1 0 17,6 0
X52
28
33
61
44
39
39
29
0
44
0
0
0
0
0
0
X53
26
29
39
27
19
38
29
0
27
0
0
0
0
0
0
0
4,4
0
5,4 4,8
5,7
0
4,2
X5 X51
Blood chemistry
4,3 5,5 4,3
X54
5,8 7
9,3 4,2 3,7
127 136 126 148
8
7,3
0
X55
1
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
Y, No: Probability Yes: of diagnosis
64 36
50 50
64 36
25 75
64 36
31 69
17 83
17 83
64 36
83 17
25 75
25 75
83 17
87 13
31 69
17 83
7
Discussion
The static Bayesian model of early detection of pneumonia in an individual patient is shown in Figs. 4, 5, 6 and 7. Now let’s analyze each specific clinical case. The analysis results are shown in Table 2. Consider clinical cases 7, 8, and 16. If we look at the Table 2, the probability of a patient being diagnosed with pneumonia is 83%, that is, the standard drug treatment regimen can be applied to him for recovery and subsequent rehabilitation (Fig. 4).
724
M. Voronenko et al.
Fig. 4. Clinical cases 7, 8, and 16.
If we look at the Table 2, in clinical cases 4, 11 and 12 with a probability of 75%, patients have pneumonia, which means that the standard scheme of drug treatment of this disease can be applied to neither (Fig. 5).
Fig. 5. Clinical cases 4, 11, and 12.
Let us descry clinical case 2. If we look at the Table 2, the patient has an ambiguous result. This result requires additional examinations (Fig. 6).
An Expert System Prototype for the Early Diagnosis of Pneumonia
725
Fig. 6. Clinical case 2 simulation results
If we look at the Table, in 3 cases out of 16 the diagnosis of pneumonia was confirmed with a probability of 83% (this is 19% of the total amount of cases in the considered sample), in 3 cases out of 16 it was confirmed with a probability of 75% (this is 19% of the total amount of cases in the considered sample), and in 2 cases out of 16, it was confirmed but with a probability of 69% (this is 13% of the total amount of cases in the considered sample). In the only case with the clinical picture of patient No. 2, the probability of the pneumonia is 50% to 50% (this is 6% of the total amount of cases in the considered sample). This result is ambiguous, most likely the patient needs additional examination (Fig. 7).
Fig. 7. Results of the analysis.
726
M. Voronenko et al.
In all other cases, we are talking more about the absence of the disease, that is, in 3 cases out of 16 the absence of pneumonia in the patient was confirmed by 64%, in 3 more cases out of 16 the absence of pneumonia in the patient was confirmed by 83%, and in 1 case of 16 of them the absence of pneumonia in the patient is not in doubt by 88% (see Table 2).
8
Conclusions
In the diagnosis of pneumonia and other infectious diseases, two approaches have long been formed and now coexist. The first characterizes the research path, which consists of the active identification and clinical symptoms assessment, based on the formation patterns and the pathological process development. The second is to establish an analogy, when, on the basis of the identified symptoms, a search is made for an image, analogy, a copy of the previously described, usually classical, i.e. a typical clinical picture of the disease (but not a pathological process). Considering that pneumonia is a special form of disease characterized by high mortality and medical costs. Taking into account the high frequency of diagnostic errors in the detection of this disease and the widespread practice of inappropriate use of drugs, we have modeled a system that can predict with a certain degree of probability the presence or absence of prerequisites for a diagnosis, which will help in obtaining recommendations for practitioners, following which will improve treatment outcomes pneumonia in persons aged 18 years and older. This study can form the basis for the development of clinical guidelines/protocols for the diagnosis and provision of appropriate medical care for adult patients with pneumonia. In our future work, plan to expand the application field of this Bayesian model.
References 1. Bidyuk, P.I., Terentev, O.M.: Zastosuvannya bayesivskogo entrance to medical diagnostics. In: Materials of the 11th International Conference on Automatic Control, vol. 3, p. 32 (2004) 2. Bidyuk, P.I., Terentyev, A.N., Hasanov, A.S.: Construction and teaching methods of Bayesian networks. Cybern. Syst. Anal. 4, 133–147 (2005) 3. Burnum, J.F.: Medical diagnosis through semiotics: giving meaning to the sign. Ann. Intern. Med 119(9), 939–943 (1993) 4. Castillo, E.F., Gutierrez, J.M., Hadi, A.S.: Sensitivity analysis in discrete Bayesian networks. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 27(4), 412–423 (1997) 5. Cheeseman, P., Freeman, D., Kelly, M., Taylor, W., Stutz, J.: Bayesian classification. In: Proceedings of AAAI, St. Paul, pp. 607–611 (1988) 6. Cofino, A.S., Cano, R., Sordo, C., Gutierrez, J.M.: Bayesian networks for probabilistic weather prediction. In: Proceedings of The 15th European Conference On Artificial Intelligence. IOS Press, pp. 695–700 (2002)
An Expert System Prototype for the Early Diagnosis of Pneumonia
727
7. Cooper, G.F.: Current research directions in the development of expert systems based on belief networks. Appl. Stochast. Models Data Anal. 5, 39–52 (1989) 8. Dagum, P., Luby, M.: Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artif. Intell. 45, 141–153 (1993) 9. Grunwald, P.: A tutorial introduction to the minimum description length principle. In: Advances in Minimum Description Length. Theory and Applications. MIT Press, Cambridge (2005) 10. Hautaniemi, S.K.: Target identification with Bayesian networks. Master of science thesis (2000). www.cs.tut.fi/∼samba/Publications 11. Krak, I., Barmak, O., Radiuk, P.: Information technology for early diagnosis of pneumonia on individual radiographs. In: Proceedings of the 3rd International Conference on Informatics and Data-Driven Medicine (IDDM-2020), vol. 2753, pp. 11–21 (2020) 12. Krak, I., Barmak, O., Radiuk, P.: Detection of early pneumonia on individual CT scans with dilated convolutions. In: Proceedings of the 2nd International Workshop on Intelligent Information Technologies and Systems of Information Security with CEUR-WS, vol. 2853, pp. 214–227 (2021) 13. Leach, R.M.: Acute and Critical Care Medicine at a Glance. Wiley-Blackwell, New York (2009) 14. Lucas, P.: Bayesian networks in medicine: a model-based approach to medical decision making (2001). 10.1.1.22.4103 15. Lytvynenko, V., et al.: Dynamic Bayesian networks application for evaluating the investment projects effectiveness. In: Babichev, S., Lytvynenko, V., W´ ojcik, W., Vyshemyrskaya, S. (eds.) ISDMCI 2020. AISC, vol. 1246, pp. 315–330. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-54215-3 20 16. Lytvynenko, V., et al.: Dynamic Bayesian networks in the problem of localizing the narcotic substances distribution. In: Shakhovska, N., Medykovskyy, M.O. (eds.) CSIT 2019. AISC, vol. 1080, pp. 421–438. Springer, Cham (2020). https://doi.org/ 10.1007/978-3-030-33695-0 29 17. Lytvynenko, V., Voronenko, M., Nikytenko, D., Savina, N., Naumov, O.: Assessing the possibility of a country’s economic growth using dynamic Bayesian network models. In: IEEE-2019 14th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), vol. CFP19D36-PRT, pp. 60–63 (2020) 18. Mackenzie, G.: The definition and classification of pneumonia. Pneumonia 8, 14 (2016). https://doi.org/10.1186/s41479-016-0012-z 19. McLuckie, A.: Respiratory Disease and Its Management. Springer, New York (2009). https://doi.org/10.1007/978-1-84882-095-1 20. Singh, M., Provan, G.: A comparison of induction algorithms for selective and nonselective Bayesian classifiers. In: International Conference on Machine Learning, pp. 497–505 (1995) 21. Stringer, J.R., Beard, C.B., Miller, R.F., Wakefield, A.E.: A new name (Pneumocystis jiroveci) for Pneumocystis from humans. Emerg. Infect. Dis. 7(9), 891–896 (2002) 22. Suzuki, J.: Learning Bayesian belief networks based on the mdl principle: an efficient algorithm using the branch and bound technique. IEICE Trans. Inf. Syst. E-82-D, 356–367 (1999) 23. Suzuki, J.: Learning Bayesian belief networks based on the minimum description length principle: basic properties. In: IEICE Trans. Fundam. E82-A, 9 (1999)
728
M. Voronenko et al.
24. Troldborg, M., Aalders, I., Towers, W., Hallett, P.D., et al.: Application of Bayesian belief networks to quantify and map areas at risk to soil threats: using soil compaction as an example. Soil Tillage Res. 132, 56–68 (2013) 25. Turuta, O., Perova, I., Deineko, A.: Evolving flexible neuro-fuzzy system for medical diagnostic tasks. Int. J. Comput. Sci. Mobile Comput. IJCSMC 4, 475–480 (2015) 26. Van der Gaag, L.C., Coupe, V.M.: Sensitivity analysis for threshold decision making with Bayesian belief net-works. In: AI*IA 99: Advances in Artificial Intelligence, vol. 1792, pp. 37–48 (2000) 27. Voronenko, M., et al.: Dynamic Bayesian networks application for economy competitiveness situational modelling. In: Advances in Intelligent Systems and Computing V. CSIT 2020. Advances in Intelligent Systems and Computing, vol. 1293, pp. 210–224 (2020) 28. World Health Organization: Pneumococcal vaccines. Wkly Epidemiol. Rec. 78(14), 110–119 (2003) 29. Zaichenko, O.Y., Zaichenko, Y.P.: Doslidzhennya Operations/Operations research. Word, Collection of tasks. Kiev (2007) 30. Zhang, Z., Kwok, J., Yeung, D.: Surrogate maximization (minimization) algorithms for adaboost and the logistic regression model. In: Proceedings of the Twenty-First International Conference on Machine Learning (ICML), p. 117 (2004)
Using Bayesian Networks to Estimate the Effectiveness of Innovative Projects Oleksandr Naumov1 , Mariia Voronenko2(B) , Olga Naumova2 , Nataliia Savina3 , Svitlana Vyshemyrska2 , Vitaliy Korniychuk2 , and Volodymyr Lytvynenko2 1
3
University of State Fiscal Service of Ukraine, Irpin, Ukraine [email protected] 2 Kherson National Technical University, Kherson, Ukraine mary [email protected], [email protected], [email protected] The National University of Water and Environmental Engeneering, Rivne, Ukraine [email protected]
Abstract. The paper proposes an application of Bayesian methodology to analyze the attachment effectiveness in the national economy. The methods for creation the BNs structure, their parametric learning, validation, and scenario analysis are examined. The research results show that at the highest level of capital investments, the financial activity result will be 48% more active, while the net profit indicator will rise by 8%. To make the profitability equal 100%, it is desirable for us to decrease the payback term by 15% and increase the level of net profit by 9%. This study can be useful for investors, managers, economists, financiers and other investment professionals. Keywords: Innovative projects · Attachment effectiveness networks · Validation · scenario analysis “What-if”
1
· Bayesian
Introduction
At present, there is an excess of supply of financial resources over demand on a global scale. Investors are actively searching for places for effective investment, considering, in this regard, the possibility of investing in the economy of countries with a transformational economy. Ukraine has certain advantages that determine its investment attractiveness [1,2]: – advantageous geographical position, at the intersection of important trade routes; – developed transport infrastructure; – natural resource potential for developing various sectors of the economy and types of industries. The main advantage is the potential of land resources, which allows to develop agricultural production; c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 729–743, 2022. https://doi.org/10.1007/978-3-030-82014-5_50
730
O. Naumov et al.
– the presence of a developed processing sector, in particular food industry enterprises; – the possibility of creating productions of the completed technological cycle ”from raw materials to finished products”; – personnel potential and the availability of highly skilled workers; – significant domestic consumer market; – scientific and technical potential and the presence of scientific organizations and researchers capable of generating new knowledge and a scientific product for the innovative development of production; – developed sector of financial and credit institutions; – the potential for the development of information technology. However, there are a number of significant drawbacks that are factors in containing investment activity in Ukraine: – a high level of systemic risk, which is determined by the instability of the political system, frequent changes in legislation, the influence of foreign policy factors, etc.; – the uncertainty of strategic directions and the lack of targeted state support and investor protection, which is especially important for innovative investments; – the underdevelopment of market institutions, in particular the lack of a securities market, which significantly reduces the investment opportunities for raising capital by domestic enterprises; – the gap in technological and organizational ties between enterprises of the raw materials and processing sectors, especially in the agri-food sector. As a result, numerous intermediary links have arisen that reduce production efficiency; – artificial disparity in prices for raw materials and industrial products and services, which led to inefficiency of the raw material base and the overflow of capital in the processing and trade sectors. These indicators affect in individual sectors of the economy and regions of the country, as well as the opportunities and investing risks in certain attachments of Ukrainian enterprises. At the same time, as practice shows, domestic and foreign businessmen want to obtain a high return on invested capital, which promises investments in innovative projects in Ukraine [14]. Investing, or the investment process is the production and accumulation of means of production and finance to ensure the movement, reproduction of capital. In Table 1 participants of the investment project are shown. When choosing the scope of capital application and making investment decisions, a potential investor is faced with the task of analyzing and adequately evaluating options. The basic conceptual approaches to making an investment decision are shown in Fig. 1. The process of developing, justifying and making investment decisions includes the following stages:
Using BN to Estimate the Eectiveness of Innovative Projects
731
Table 1. Participants of the investment Participant
Role
Interests
Investor
Investor provides investments and Maximization of income (or other expects a return on investment effects) on investments at a given and the expected return (or other level of risk effects) from them
Contractor
Contractor performs asset creation Fulfillment of work in accordance with specified requirements, within specified periods, maximizing your remuneration while minimizing your costs
Customer
Customer organizes the use of the provided investments to create an asset with the involvement of Contractors and its transfer to the Balance holder
Balance Holder
Balance Holder exploits created Maximizing operating income (or assets and provides income (or other effects) while minimizing other effects) from their operation operating costs
The creation of an asset that meets the specified requirements, in the specified time and maximizing your remuneration while minimizing your costs
Fig. 1. Scientific and methodological approaches to making investment decisions
732
O. Naumov et al.
1. Assessment of the preconditions for making a decision on capital investment: – analysis of the investment climate (prevailing conditions of investment activity) of the country/region/industry; – assessment of the state of the sphere/sector of the economy and development prospects; – forecast market conditions for products: demand, supply, price, competition. 2. Determining the investment attractiveness of the enterprise - the initiator (developer) of the project: – assessment of supply and marketing activities and distribution channels; – analysis of positive and negative factors that determine the outcome of the project; – assessment of the security of the enterprise with resources (material, labor, financial, energy, etc.). 3. Analysis and evaluation of the investment project and decision-making on investment: – study of project documentation - a feasibility study, an investment application, a business plan and determining the degree of its elaboration and readiness for implementation; – assessment of the real need for investment; – analysis and justification of the choice of sources of financing for the project; – analysis of project cash flows (cash-flow); – assessment of profit and loss for the project; – calculation and analysis of project performance indicators (NPV, PI, PBP, IRR); – analysis of project risks; – availability of guarantees for the return of investor funds. 4. Organization and management of the project: – item building a management structure, defining roles, responsibilities and responsibilities; – attraction of co-executors of the project; – conclusion of agreements on the supply of resources and the sale of finished products; – staffing; – formation of a system for controlling the project progress.
2
Problem Statement
According to generally accepted practice in the world, the key stage in making a decision on investing in a project is the stage of calculating and analyzing project performance indicators, which are the main indicators that investors and experts are guided by when evaluating alternative investment options [7,22]. Investing, or the investment process is the production and accumulation of means of production and finance to ensure the movement, reproduction of capital. We
Using BN to Estimate the Eectiveness of Innovative Projects
733
Fig. 2. Influence of input data on decision criteria
have chosen detectors of the attachment effectiveness, the project period, the payback period, internal profitability etc. (Fig. 2). For a set of events X (i) , i =1, K, N that are related, and a set of learn(1) (2) (N ) ing data D = (d1 , K, dn ), di = xi xi Kxi , is given. Here the subscript is the observation amount, and the upper one is the variable amount, n – is the amount of observations, each observation consists of N (N ≥ 2) variables, α(j) ≥ 2 and each j-th variable j = 1, K, N has A(j) = 0, 1, K, α(j) − 1 conditions. Based on a given training sample, you need to build an acyclic graph connecting the event sets X, i = 1, . . . , N . In addition, each BN struc ture g ∈ G is represented by a set N of predecessors P (1) , K, P (N ) , that is, (j) for each vertex j = 1, K,N, P it is a variety of parent vertices, such that P (j) ⊆ X (1) , K, X (N ) | X (j) . The aim of the work is to create a Bayesian-based system for the estimating the attachment effectiveness.
3
Literature Review
Many scientists and economists have been involved in the theory of investment analysis and financing of investment projects [5,10,18,21,23,27]. The authors [4,11] investigate the fundamentals of the investment process and the factors that can influence the decision that the investor ultimately makes.
734
O. Naumov et al.
They analyzed the risks, behavior and the mechanism by which the manager acts when making decisions about financial investments. The work is devoted to the problems of financial planning in the era of asset allocation [6,13,17,30]. The authors considered an applied approach to entrepreneurial financial management. This is one of the few works that examines the issues under consideration, with the aim of applying the proposed advice in practice in real economic conditions. Using the proposed methods for searching the amount of an substance, one can succeed in investing projects, according to the authors. The paper [26] discusses the advantages and disadvantages of the two approaches. The first approach is performance-based budgeting and the second approach is performance-based budgeting. The authors argue that the purpose of the investment determines which approach the investor will ultimately take. The authors [9,20] consider methods for optimizing investments, as well as ways to optimize the total cost of attracting capital investments and argue that capital investments in fixed assets have an advantage over other types of investments. A guide to action, as well as a comparative study, are presented in [12]. A theoretical analysis of budgeting by type of activity is contained in [19].
4 4.1
Materials and Methods Data
The sample of input amounts (Fig. 2) consists of 14 indicators that reflect the financial activities of 11 enterprises for the period 2019–2020. We took the data table [8], in which all indicators are combined into two groups: input parameters and criteria according to which the participants in the investment process make a decision to invest (Table 2). The resulting key node Y reflects the likelihood of success or failure of the investment project. The interaction of input data is shown in Fig. 3. The decision on whether to invest money should be made taking into account the interests of all investment process participants, as well as with a detailed consideration of all of the above criteria values. An important role in this decision should also be played by the structure and time distribution of capital attracted for the implementation of the project, as well as other factors, some of which lend themselves only to meaningful accounting. The life cycle of an investment project consists of the following main stages: – pre-investment stage (the “zero” stage, since the project does not yet exist) is a research, development of the project, preparation of documentation for the investor. At this stage the project initiator bears costs (IC0), there is no income; – stage of investment. At this stage, funds from third-party investors are attracted (shown at the top of the graph as income). Received investor funds, as well as own and borrowed funds are invested in the project. The amount of investment is indicated by IC1. These investments are made in a relatively
Using BN to Estimate the Eectiveness of Innovative Projects
735
Table 2. Business Inputs No 1 2 3 4 5 6 7 8
Data
Known input data
Specification χ1 χ11 χ12 χ2 χ21 χ22 χ23 χ24
9
χ3
10
χ4
The enterprise investments volume, UAH The enterprise own funds, UAH Cash investor, UAH The enterprise financial activities result, UAH The enterprise net profit, UAH The enterprise depreciation costs, UAH The enterprise costs for a loan paying, UAH Cash accumulations held by the enterprise in the balance form at the period’s end, UAH The term within which the project is being implemented, quarter Reduction rate,%
11 χ31 Net present value 12 Investment χ32 Profitability Index 13 Decision χ33 Payback constraint, quarter 14 Criteria χ41 Internal rate for return on money invested,%
Fig. 3. Life cycle and performance indicators of the investment project
736
O. Naumov et al.
short time one-time: the purchase of land, construction, equipment, installation and commissioning, start-up, trial batch, etc. Investments in working capital immediately begin: raw materials and materials, fuels, energy, labor, etc. (IC2); – operation of the project. This is the stage of production and sale of products for the project. There is a gradual exit to planned capacity and further investment of working capital (IC2) is necessary. At the same time, there is an increase in sales revenue. Sales revenue - costs (other than depreciation) = cash flow (CF). There is an increase in production and an increase in income. The moment when the total, accumulated income exceeds the cost is the point of return on the project. The time period from the start of the project to the payback point is the payback period (PBP); – completion of the project. This is the stage of return of investor funds. NPV is the difference between the total income and expenses of the project. 4.2
Bayesian Networks
Bayesian network of trust (English Bayesian network, belief network) is a graph model, which is a variables set and their Bayesian connections. For example, a Bayesian network can be used to calculate the likelihood that a patient is sick by the presence or absence of a number of symptoms, based on data on the relationship between symptoms and diseases. The mathematical apparatus of Bayesian networks was developed by the American scientist Jude Pearl in 1988 [15]. Bayesian networks are used if we are dealing with the calculation of the probability of a hypothesis being true under conditions when, based on observations, only some partial information about the events is known. In other words, according to the Bayes formula, we recalculate the probability, taking into account both new observations values, as well as previously known information. Observed events cannot be unambiguously described as direct consequences of strictly determined causes. And therefore, in practice, a probabilistic description of phenomena is widely used. There are several reasons for this: the presence of fatal errors in the process of experimentation and observation; the impossibility of a complete description of the structural difficulties of the system under study; uncertainties due to the finiteness of the observation volume. On the way of probabilistic modeling there are certain difficulties that can be divided into two groups: – technical (computational complexity); – ideological (the presence of uncertainty, difficulty in setting the problem in terms of probabilities, insufficiency of statistical material). Let G = (V, Bi ) be a graph in which the ending V is a set of variables; Bi is nonreflexive binary relation on [9,20]. Each variable v has a set of parent variables c(v) ⊆ V and a set of all d(v) ⊆ V descendants . A set s(v) is a set
Using BN to Estimate the Eectiveness of Innovative Projects
737
of child variables for a variable v and s(v) is a subset of d(v). Let’s also mark, that: (a(v) ⊆ V ) = V − (d(v) ∪ {v}) , (1) that is, a(v) is a set of propositional variables from the set V , excluding the variable v and its descendants. The set of variables B, isthecontexture of arguments. It is constitute of signs Qxi|pa (X i ) = P xi | pa X i for each xi amount from X i and pa(X i ) from P a(X i ). Wherein P a(X i ) indicates the variable X i parents set in G. Each X i is suggested as a peak in graph G [8,12,15,16,19]. The BN’s total probability B is specified according to equality: N PB X i | P a X i . PB X 1 , . . . , X N =
(2)
i=1
The validation process produced under the algorithm EM-expectation maximization, which was invented in 1977 by [3,25]. The method of maximum expectation EM is a procedure of iterations, which is created to solve optimization problems of some functionality, using an analytical search for objective function extremes. This method is divided into two stages. Firstly, it will be “expectation” (E - expectation) on the basis of available observations (patients) the expected values for each incomplete observation are calculated. After receiving the filled data set, the basic statistical parameters are estimated. In the second stage, “maximization” (M - maximization) maximizes the degree of compliance of the expected and actually substituted data [28,29]. The sensitivity measures how much the target node’s assumptions will change if a result is entered at another node. Sensitivity analysis does not indicate the direction of this relationship [8,15,24].
5
Experiments, Results and Discussion
We solved the problem of engineering the Bayesian model with the GeNIe application version 2.3. Based on the available indicators it is need to find the attachment effectiveness degree. The structural network looks like the one shown in Fig. 4. The network contains 4 key nodes and 10 general nodes (Table 3). During the sensitivity analysis procedure, it was confirmed that the X22 node is not sensitive and should be removed from the model. Also, arcs between nodes X31 and X21, X32 and X21 also do not affect the result. The updated circuit has acquired the form shown in Fig. 5. For potential businessmen, the project that they are going to finance will be considered attractive if its profitability is higher than with any other method of investment. In the course of the experiments, we were fortunate enough to identify certain conditions that can help increase the profitability of an investment project.
738
O. Naumov et al.
Fig. 4. Structural BN designed to predict the investment project success degree Table 3. Matrix of input data χ1
χ11
χ12
χ2
χ21
χ22
χ23 χ24
530860
48860
428000
0
1654435
376767
0
826414
180000
20000
160000
-179750
216191
6825
26
53166
2117392 117392 2000000 117392
2555565
27483
0
700440
4702502 646702 4055800 4702502
4049603
1806662
0
1800800 300800 1500000 0
2119018
194978.8 0
411000
0
411000
0
958863.4 108042
0
548497.8
456000
176000 280000
0
392712.1 32118.6
0
112712.1
2233216 619018.3
1000000 0
1000000 349660
2034038
424017
0
873834
260000
260000
410054
358548
0
150054
4722300
2949900
0
0
1800000 0
1800000 -595200
121800
30
3838000 93000
3745000 -3210000 2840170
2167966
0
1271130
19600
19600
8610
20
21836.4
0
-3920
34786.4
χ3 χ31
χ32
15
12479
1.039 1.7
χ33
12
16
0
15
11937
1.07
8
20
1
15
89534
1.045 1.91
10
18
1
15
297771
1.073 2.8
12
24
0
15
64701.76 1.043 2.88
12
17
0
20
672840
1.64
1.22
24
24
0
15
7022.9
1.025 2.44
12
15
0
15
128964.9 1.02
12
20
0
15
38300.7
1.147 2.4
12
22
1
40
486390
1.27
6
59
1
12
164410.9 1.044 2.87
12
14
1
15
11185.78 1.57
43
0
1.86
2.75 1.18
χ4 χ41 Y
0.633333 4
Using BN to Estimate the Eectiveness of Innovative Projects
739
Fig. 5. Updated static Bayesian model
1. To increase the internal profitability indicator to the highest level (that is, from 12% to 100%), we will have to decrease the signification of payback constraint from 24% to 9% and slightly increase profits from 15% up to 24%, as shown in Fig. 6. 2. If the percentage of the discount rate increases to the maximum (from 2% to 100%), the net present value decreases by 34% (from 54% to 20%), and the profitability index decreases by 21% (from 51% to 20%). In our case, this leads to a decrease in project success and uncertainty, since in this case, the simulation result will be 50% success and 50% failure, as shown in Fig. 7. The increase in the discount rate also indicates the presence of other changes: the percentage of risks increases, the level of inflation increases, and other external and internal factors appear that we do not control and they are always accompanied by the effect of surprise. 3. If signification of net present amount will rise from 9% to 37%, and profitability will rise to 2%, and also the signification of net profit will rise to 4%, as shown in Fig. 8, the signification of profitability rise to 90% (from 10% to 100%). 4. At the maximum level of net profit (growth from 15% to 100%), the signification of profitability will rise to 8%, the cash balance will increase by 5% (from 12% to 17%), the signification of financial activity improve by 11% (from 13% to 24%), as shown in the Fig. 9.
740
O. Naumov et al.
Fig. 6. The internal profitability indicator rise to the highest level
Fig. 7. Conditions for increasing the percentage of the discount rate to the maximum
Fig. 8. The view of Bayesian network when the signification of profitability index rises to the highest level
Using BN to Estimate the Eectiveness of Innovative Projects
741
Fig. 9. Conditions under which financial performance will improve significantly
6
Conclusions
Any project involving investment is a combination of economic, financial, managerial, commercial actions, interrelated and aimed at the success or failure of the proposed project. The research results show that at the highest level of capital investments, the financial activity result will be 48% more active, while the net profit indicator will rise by 8%. To rise to the profitability up to 100%, it is desirable for us to reduce the signification of payback constraint by 15% and increase the level of net profit by 9%. These are the very recommendations that will provide all members of the investment project with the profitability of their investments and protect investors from collapse. If you ask yourself how to lead a project to success, then you need to study and evaluate the trends of complex interactions of various indicators and their development in the future several steps forward.
References 1. Ukraine 2019–2020: broad opportunities, contradictory results (2020). https:// razumkov.org.ua/uploads/other/2020-PIDSUMKI-ENG.pdf 2. Ukraine: Investment guide. dlf attorneys-at-law (2021). https://dlf.ua/en/ukraineinvestment-guide/ 3. de Andrade, B.B., Souza, G.S.: The EM algorithm for standard stochastic frontier models. Pesquisa Operacional 39(3) (2019). https://doi.org/10.1590/0101-7438. 2019.039.03.0361 4. Bonello, A., Grima, S., Spiteri, J.: Understanding the investor: A maltese study of risk and behavior in financial investment decisions (vol. first edition). bingley, uk: Emerald publishing limited (2019). http://search.ebscohost.com/login.aspx? direct=true&site=eds-live&db=edsebk&AN=1993147 5. Cakici, N., Zaremba, F.: Size, value, profitability, and investment effects in international stock returns: are they really there? Jo. Investing Apr (1) (2021). https:// doi.org/10.3905/joi.2021.1.176
742
O. Naumov et al.
6. Cornwall, J.R., Vang, D.O., Hartman, J.M.: Entrepreneurial financial management: an applied approach (2019). http://search.ebscohost.com/login.aspx?direct=true& site=eds-live&db=edsebk&AN=2237944 7. Dimitras, A.I., Papadakis, S., Garefalakis, A.: Evaluation of empirical attributes for credit risk forecasting from numerical data. Invest. Manage. Financ. Innov. 14(1), 9–18 (2017). https://doi.org/10.21511/imfi.14(1).2017.01 8. Dogru, T., Upneja, A.: The implications of investment-cash flow sensitivities for franchising firms: theory and evidence from the restaurant industry. Cornell Hospitality Q. 60(1), 77–91 (2019). https://doi.org/10.1177/1938965518783167 9. Dugar, A., Pozharny, J.: Equity investing in the age of intangibles. Financ. Anal. J. 77(2), 21–42 (2021). https://doi.org/10.1080/0015198X.2021.1874726 10. Garde, A., Zrilic, J.: International investment law and non-communicable diseases prevention. J. World Investment Trade 21(5), 649–673 (2020). https://doi.org/10. 1163/22119000-12340190 11. Gilbert, E., Meiklejohne, L.: A comparative analysis of risk measures: a portfolio optimisation approach. Invest. Anal. J. 48(3), 223–239 (2019). https://doi.org/10. 1080/10293523.2019.16431282 12. Harford, J., Kecskes, A., Mansi, S.: Do long-term investors improve corporate decision making? J. Corp. Finan. 50, 424–452 (2017). https://doi.org/10.1016/ j.jcorpfin.2017.09.022 13. Jayaraman, S., Shuang, Wu., J. : Should i stay or should i grow? using voluntary disclosure to elicit market feedback. Rev. Financ. Studi. 33(8), 3854–3888 (2020). https://doi.org/10.1093/rfs/hhz132 14. Kyzym, M.O., Doronina, M.: Economic Science in Ukraine: challenges, problems and ways of their solving. Problems Econ. (3), 156–163 (2019). https://doi.org/10. 32983/2222-0712-2019-3-156-163 15. Lekar, S., Shumeiko, D., Lagodiienko, V. Andi Nemchenko, V.: Construction of bayesian networks in public administration of the economy. Int. J. Civil Eng. Technol. 10(3), 2537–2542 (2019). http://www.iaeme.com/IJCIET/issues. asp?JType=IJCIET&VType=10&IType=03 16. Niloy, N., Navid, M.: Na¨ıve bayesian classifier and classification trees for the predictive accuracy of probability of default credit card clients. Am. J. Data Mining Knowl. Discov. 3(1), 1–12 (2018). https://doi.org/10.11648/j.ajdmkd.20180301.11 17. Park, S.Y., Schrand, C.M., Zhou, F.: Management Forecasts and Competition for Limited Investor Resources (2019). https://ssrn.com/abstract=3357603 18. Paskaramoorthy, A.B., Gebbie, T.J., van Zyl, T.L.: A framework for online investment decisions. Invest. Anal. J. 49(3), 215–231 (2020). https://doi.org/10.1080/ 10293523.2020.1806460 19. Poonam, M., Harpreet, A.: Analytical study of capital budgeting techniques (Only automobiles companies). Asian J. Multidimension. Res. 8(6), 150–162 (2019). https://doi.org/10.5958/2278-4853.2019.00226.X 20. Lopez de Prado, M., Vince, R., Zhu, Q.: Optimal risk budgeting under a finite investment horizon. Risks 7(3) (2019). https://doi.org/10.3390/risks7030086 21. P¨ arssinen, M., Wahlroos, M., Manner, J., Syri, S.: Waste heat from data centers: an investment analysis (2019). http://search.ebscohost.com/login.aspx?direct=true& site=eds-live&db=edsbas&AN=edsbas.73BBA477 22. R¨ osch, D.M., Subrahmanyam, A., van Dijk, M.A.: Investor short-termism and real investment. J. Financ. Markets (2021). https://doi.org/10.1016/j.finmar.2021. 100645
Using BN to Estimate the Eectiveness of Innovative Projects
743
23. Ryu, D., Ryu, D., Yang, H.: Investor sentiment, market competition, and financial crisis: evidence from the korean stock market. Emerg. Mark. Financ. Trade 58(81), 1804–1816 (2020) 24. Shah, A.: Uncertain risk parity. J. Investment Strat. 10(3) (2021). http://doi.org/ 10.21314/JOIS.2021.009 25. Shi, Y., Liu, H.: EM-detwin: a program for resolving indexing ambiguity in serial crystallography using the expectation-maximization algorithm. Crystals 10(7) (2020). https://doi.org/10.3390/cryst10070588 26. de Souza, P., Rogerio, M., Lunkes, R., Bornia, C.: Capital budgeting: a systematic review of the literature (2020) https://doi.org/10.1590/0103-6513.20190020 27. Sven, O.S., Michniuk, A., Heupel, T.: Beyond budgeting - a fair alternative for management control? - examining the relationships between beyond budgeting and organizational justice perceptions. Stud. Bus. Econ. 2(160) (2019). https:// doi.org/10.2478/sbe-2019-0032 28. Tian, G.L., Ju, D., Chuen, Y.K., C., Z.: New expectation-maximization-type algorithms via stochastic representation for the analysis of truncated normal data with applications in biomedicine. Stat. Methods Med. Res. 27(8), 2459–2477 (2018). https://doi.org/10.1177/0962280216681598 29. Yang, Y., M´emin, E.: Estimation of physical parameters under location uncertainty using an ensemble-expectation-maximization algorithms. Q. J. R. Meteorol. Soc. 145, 418–433 (2019). https://doi.org/10.1590/0101-7438.2019.039.03.0361 30. Ye, M., Zheng, M., Zhu, W.: Price Discreteness and Investment to Price Sensitivity. Available at SSRN (2019)
Method of Transfer Deap Learning Convolutional Neural Networks for Automated Recognition Facial Expression Systems Arsirii Olena(B) , Denys Petrosiuk , Babilunha Oksana , and Nikolenko Anatolii Odesa Polytechnic State University, Odesa, Ukraine {petrosiuk.d.v,babilunga,nikolenko}@opu.ua
Abstract. The analysis of automated solutions for recognition of human facial expression (FER) and emotion detection (ED) is based on Deep Learning (DL) of Convolutional Neural Networks (CNN). The need to develop human FER and ED systems for various platforms, both for stationary and mobile devices, is shown, which imposes additional restrictions on the resource intensity of the DL CNN architectures used and the speed of their learning. It is proposed, in conditions of an insufficient amount of annotated data, to implement an approach to the recognition of the main motor units of facial activity (AU) based on transfer learning, which involves the use of public DL CNNs previously trained on the ImageNet set with adaptation to the problems being solved. Networks of the MobileNet family and networks of the DenseNet family were selected as the basic ones. The DL CNN model was developed to solve the FER and ED problem of a person and the training method of the proposed model was modified, which made it possible to reduce the training time and computing resources when solving the FER and ED problems of a person without losing the reliability of AU recognition. Keywords: Facial expression recognition · Convolutional neural network · Transfer learning · Emotion detection · Deep learning
1
Introduction
Changing facial expressions is a natural means of communication, conveying emotions, intentions and a person’s state of mind. The automated recognition of facial expressions (Facial Expression Recognition, FER) and determination of emotions (Emotion Detection, ED) of a person is an intellectual task. The solution to this problem is relevant when creating systems of human-machine c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 744–761, 2022. https://doi.org/10.1007/978-3-030-82014-5_51
Method of Transfer DL CNN for Automated RFE Systems
745
interaction, in which the state of the operator is determined as a reflective agent; algorithms for realistic animation of a human face in computer games and computer graphics; information technologies of emotional marketing for promoting goods, taking into account their perception by a person, when conducting behavioral and neurobiological studies of a person’s condition, etc. [22]. With the growth of computing capabilities in the development of the listed information systems and technologies, automated solutions for FER and ED based on deep learning (DL) of convolutional neural networks (CNN) began to appear [15]. However, despite the successful application of CNN for object recognition and classification in computer vision systems, solving human FER and ED problems using DL CNN remains a difficult problem. This is due to the fact that to implement the algorithms used in DL CNN, a sufficiently large set of annotated data and high-performance graphics processors (Graphics Processing Unit – GPU) are required and, even if they are available, time to train any complex network suitable for solving practical problems FER and ED will be long enough. One should also take into account the need to develop human FER and ED systems for various platforms, both for stationary and mobile devices, which imposes additional restrictions on the resource intensity of the CNN architectures used and the speed of their learning. For these reasons, for DL CNN, researchers began to use the transfer learning approach, which involves the use of neural networks trained on data from a specific subject area to solve other types of problems. In the case of solving the FER and ED problems, the use of CNN transfer learning has the following advantages: you do not need to design your own CNN architecture, it is possible to use a much smaller set of initial data for training and a significant reduction in time and computational resources compared to full training. Therefore, the development of a CNN transfer learning method for recognizing facial expressions and determining emotions is an urgent task.
2
Analysis of Existing Research and Publications
Most modern human FER and ED systems [15,22] are capable of recognizing with a certain accuracy a small set of human facial expressions, consisting of six universal emotions, such as “anger”, “disgust”, “fear”, “happiness”, “sadness” and “surprise”, which were introduced by P. Ekman [6]. However, real human emotions are composed of many expressions, differing in minor changes in some facial features. Therefore P. Ekman in collaboration with W.V. Friesen also proposed a Facial Action Coding System (FACS) [7], which describes the movement of facial muscles using different motor units (Action Unit, AU). In the FACS system, to describe all possible and visually observed changes in a person’s face, 12 motor units for the upper part of the face and 18 motor units for the lower part are defined (Fig. 1, a) Fig. 1, which are associated with the contraction of a certain set of muscles. In this case, motor units can occur separately or in combination (Fig. 1, b). Two types of state coding are used to describe motor units. The first, more simple, is the coding of the presence or absence of a motor
746
A. Olena et al.
unit on the face. In the second case, in addition to the first, the intensity or strength of the action is also indicated, with 5 levels of intensity coding possible (neutral < A < B < C < D < E), where A is the least intense action, and E is the action of the maximum force [7]. It is also worth noting that the six universal emotions introduced by P. Ekman can be described using a combination of several AUs (Fig. 1b) [27]. Thus, FACS is a powerful tool for detecting and measuring a large number of facial expressions by visually observing a small set of muscle actions (Fig. 1c).
Fig. 1. Representation of motor units [7]: a – AU for the upper and lower parts of the face; b – combinations of AU in emotions; c – an example of AU distribution with different intensities on a face image
It is known that in a classical intelligent system using a camera, the FER and ED process of a person is implemented in the form of the following main stages: obtaining an initial image using a camera, preliminary image processing (localizing the face area, eliminating head rotation, scaling, normalizing illumination), identifying features for recognition, as well as the classification of facial
Method of Transfer DL CNN for Automated RFE Systems
747
expressions by them. Approaches to recognition of facial expressions based on AU differ either in the methods of identifying features in the image, or classification methods, or both [15,22]. Most of the traditional methods of feature extraction are based on special algorithms and applications developed on their basis for isolating geometric or textural features of a face image, or other wellknown methods for isolating informative features [2–4,13,28,31]. Nevertheless, in real conditions, such approaches are relatively unreliable, since they do not provide sufficient invariance to scale, rotation, shift and other possible spatial distortions of the image and are not able to take into account a wide variety of complex scenes and dynamics of changes in facial expressions. On the other hand, in recent years, CNN has shown the best results in the field of face and object recognition on stage, which is a logical development of the ideas of the cognitron and neocognitron [15]. The undoubted advantage of using CNN for recognition tasks is taking into account the two-dimensional topology of the image. To ensure invariance to rescale, rotation, shear and spatial distortion of the image, CNN implements three architectural solutions: – local receptor fields – provide local two-dimensional connectivity of neurons; – common synaptic coefficients – provide detection of some features anywhere in the image and reduce the total number of weight coefficients; – hierarchical organization with spatial subsampling – provides representation of image features with different spatial resolution. However, the classical structure of CNN [14], trained on the small set of available data, is not able to deeply study the details of objects in the scene or the features of the image of a person’s face. Therefore, the state of the art computational capabilities allows CNNs to be used in combination with deep learning (DL) based capabilities, so called deep CNN models (DL CNN). DL CNNs come in many variations, however, they typically consist of three types of layers: Convolution Layers, Subsampling Layers, and Full Connected Layers [15] Convolutional and subsampling layers alternate with one another to form a deep model (Fig. 2 ). It is known that the lower convolutional layers of DL CNN capture low-level features of an image, such as edges, outlines, and parts of an image. The middle layers provide recognition of groups of low-level features, for example, a part of a face, which in turn are elementary features for subsequent layers. The last layer defines the class of the image.
Fig. 2. DL CNN architecture
748
A. Olena et al.
The Convolution Layers has a set of trainable filters to convolve the entire input image and generate various specific feature maps. The convolution operation provides three main advantages: local processing, which takes into account correlations between neighboring pixels; weight distribution in the same feature map, which significantly reduces the number of trained parameters; invariance to the location of the object. The last stage of convolution is to apply the activation function to all values of the resulting matrix, which is designed to add nonlinearity to the network. In practice, various activation functions are used – sigmoid, hyperbolic tangent, ReLU, etc. The Subsampling layer follows the convolutional layer and is used to reduce the spatial size of feature maps and, accordingly, reduce the computational cost of the network. Average Pooling and Max Pooling are two of the most commonly used non-linear strategies for performing subsampling to reduce the size of feature maps. A fully connected layer (Fully connected layer) is usually added at the end of the network to convert 2D feature maps to 1D for further representation and classification of features. At the moment, systems based on various CNN modifications are considered the best systems in terms of accuracy and speed of object recognition on stage. This is evidenced by CNN’s first places in the annual competition for pattern recognition from the ImageNet dataset – Image Net Large-Scale Visual Recognition Challenge (ILSVRC) [10]. ImageNet is known to be a huge collectively collected (crowdsourced) dataset of more than 15 million high-resolution images from 22,000 categories (Fig. 3). ILSVRC uses a subset of ImageNet with roughly 1000 images in each of the 1000 categories. There are approximately 1.2 million training images in total, 50,000 images for verification and 150,000 images for testing. As studies on the ImageNet dataset show, the undoubted advantage of using them for human face recognition is taking into account the two-dimensional topology of the image, which is very relevant when solving the FER and ED problems of a person. In addition, recent efforts to solve general problems of object classification based on the use of DL CNN [12] have allowed the development of more sophisticated models that are able to form more reliable representations of features based on the original image without explicitly considering and modeling the local characteristics of various parts of the face and relationships between facial markers. In recent years, DL CNNs have become widely used for AU recognition due to their powerful feature representation and end-to-end effective learning scheme, which greatly contributed to the development of human FER and ED [16,29,32]. Deep learning techniques, implement an approach in which features are extracted directly from the input data itself, trying to capture high-level abstractions through hierarchical architectures of multiple nonlinear transformations and representations. The success of this approach for human FER and ED is evidenced by the work of other authors based on DLCNN models, which currently show the best results on the DISFA set [19]. For example, in [16,30], an
Method of Transfer DL CNN for Automated RFE Systems
749
Fig. 3. Examples of images from different categories of the ImageNet dataset
EAC-Net approach for AU detection was proposed, based on adding to the previously trained network two new networks trained to recognize AU by features extracted from the entire image and by pre-cut separate areas of face images representing areas interest in terms of recognition of individual types of AU. The authors use the CNN VGG-19 model in their approach [26]. The assumed area of each AU in the image has a fixed size and a fixed location, which is determined by the feature labels on the face image. Based on the structure of the E-Net [16], the adversarial learning method between AU recognition and face recognition was proposed. In [24], a JAA-Net approach with an adaptive learning module is proposed to improve the initially defined areas of each AU in the image. All of these works demonstrate the effectiveness of modeling the distribution of spatial attention for detecting AU, that is, tracking the area of location of a specific AU (by analogy with a human visual analyzer) when analyzing a scene in FER and ED problems. The AU classification results obtained in the studies considered will be used in this work as the baseline for comparing the recognition reliability by the proposed method. Summing up, it can be indicated that the efforts of the research groups are mainly aimed at improving either the methods for identifying facial features or the methods for classifying AU. These methods recognize AUs or specific combinations of AUs independently, statically or dynamically, with or without an assessment of the intensity of the presence of AUs. At the same time, the development of modern image classification models often requires significant investment in the design and configuration of the network architecture, as well
750
A. Olena et al.
as a large dataset for learning. At the same time, it is known that the process of full DL CNN training on the corresponding data set requires large computational resources and takes a lot of time. For these reasons, the transfer learning approach with adaptation to the problems being solved is relevant when solving the problem of recognizing AU on images. This approach was made possible by the general availability of a large number of modern, pre-trained DL CNNs.
3
The Purpose and Objectives of the Research
The purpose of the work is to reduce the training time and computational resources when solving human FER and ED problems by using the transfer learning method of publicly available DLCNNs with subsequent “fine tuning” of the network parameters without losing the reliability of AU recognition. To achieve the goal, it is necessary to solve the following tasks: – explore the features of transfer learning; – it is reasonable to choose a pre-trained CNN and explore the possibilities of its further use in transfer learning; – develop a DL CNN model and a method for its training for solving FER and ED problems based on transfer learning, taking into account the specifics of the selected pretrained CNN; – evaluate the reliability of AU recognition based on the developed model and method.
4 4.1
The Proposed Approach Study of the Peculiarities of Learning
Transfer learning consists in transferring the functions of describing features obtained by the DL CNN model with multiple layers in the process of solving the original recognition problem to the target recognition problem [1,18,21]. The transfer learning process in the context of DL CNN can be represented by the following stages (Fig. 4): 1. Stage 1. Convolutional layers are extracted from the previously trained model (pre-train). 2. Stage 2. Convolutional layers are frozen to avoid destroying any information they contain during future train eras (train). 3. Stage 3. Add several new trainable layers on top of the frozen layers. They will learn how to turn old feature maps into predictions for a new dataset. 4. Stage 4. Train new layers on the target dataset. 5. Stage 5. The last step is fine-tune, which consists in unlocking the entire model obtained above (or part of it) and retraining on the target dataset with a very low learning rate. This can potentially lead to significant improvements by gradually adapting pretrained networks to new data.
Method of Transfer DL CNN for Automated RFE Systems
751
Fig. 4. Scheme of implementation of transfer learning based on DL CNN
The main advantages of using transfer learning: there is no need to design your own DL CNN architecture; less data needed for training; less time is needed for training, compared to full training; you can get DL CNN models with high classification accuracy in less time. The paper implements a learning transfer method aimed at detecting and recognizing AU on a static image of a human face. At the same time, public DL CNN pre-trained on the ImageNet set were used. 4.2
Selecting Public DLCNNs to Use as a Pre-trained Model
In [5], the authors published the results of a study of publicly available DL CNNs on the ImageNet dataset according to the following criteria – resource intensity, classification accuracy, and performance on the NVIDIA Titan X Pascal GPU platform. Authors were analyzed the diagram of the dependence of the classification accuracy on the number of floating-point computations for a fairly wide list of public DL CNNs from this work. DenseNet-121, DenseNet-201 [9], MobileNet-v1 [8], MobileNet-v2 [23] were selected for further research. The choice was made taking into account the requirements for learning speed and resource intensity. The latter requirement is very important when creating mobile applications for FER and ED people. The MobileNet family is superior in compactness to DenseNet and VGG-19, but inferior in classification accuracy on the ImageNet dataset. At the same time, it is noted in [21] that the MobileNet family of networks outperforms the selected networks of the DenseNet family in
752
A. Olena et al.
performance more than three times and exceeds the VGG-19 network by more than 1.5 times. Here is a more detailed description of the tagged DL CNN. A.DenseNet Family of Networks. Dense Convolutional Network model – DenseNet is designed for stationary devices and is implemented through the formation of a sequence of “dense” blocks (dense block) – each block contains a set of convolutional layers and transition layers (transition layer) that resize the feature map. A distinctive feature of DenseNet is the presence of direct connections from each convolutional layer to each other layer. That is, if ordinary convolutional networks with L layers have L connections – one between each layer and its subsequent layer, then DenseNet has (L(L + 1))/2 direct connections [9]. For each layer, feature maps of all previous layers are used as input, and its own feature maps are used as input for all subsequent layers. DenseNet has several advantages: it solves the problem of very low gradient values, improves feature propagation, implements feature map reuse, and significantly reduces the number of tunable parameters. The CNN DenseNet architecture is shown in Fig. 5. At the output of the network, there is a classification layer.
Fig. 5. DenseNet topology [9]
The structure of “dense” blocks and transition layers of the DenseNet-121 and DenseNet-201 networks for solving the problem of image classification of the ImageNet set is presented in Table 1 [9]. B.MobileNet Family of Networks. The advent of the MobileNet family of networks [8,23] revolutionized computer vision on mobile platforms. MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of a variety of use cases. They can be built upon for classification, detection, embeddings and segmentation. The convolutional part of MobileNet-v1 consists of one ordinary convolutional layer with 3 × 3 convolution at the beginning and thirteen depth wise separable convolutions block, shown in Fig. 6a, with a gradually increasing number of filters and a decreasing spatial dimension of the convolution kernel. A feature of this architecture is the absence of max pooling layers. Instead, convolution with a step equal to 2 is used to reduce the spatial dimension. In a MobileNet based classifier, there is usually a global middle pool level at the very end, followed by a fully connected classification level or an equivalent 1 × 1 convolution and softmax.
Method of Transfer DL CNN for Automated RFE Systems
753
Table 1. Layer structure of networks of the DenseNet family Layers
Outputsize DenseNet-121
Convolution
112 × 112
Pooling
56 × 56
Dense Block (1)
56 × 56
Transition Layer (1) 56 × 56 28 × 28 Dense Block (2)
28 × 28
Transition Layer (2) 28 × 28 14 × 14 Dense Block (3)
14 × 14
Transition Layer (3)
14 × 14 7×7
Dense Block (4)
7×7
Classification Layer
1×1
DenseNet-201
7 × 7 conv, stride 2 3 × 3 maxpool, stride 2 1 × 1 conv 1 × 1 conv ×6 ×6 3 × 3 conv 3 × 3 conv 1 × 1 conv 2 × 2 average pool, stride 2 1 × 1 conv 1 × 1 conv × 12 × 12 3 × 3 conv 3 × 3 conv 1 × 1 conv 2 × 2 average pool, stride 2 1 × 1 conv 1 × 1 conv × 24 × 48 3 × 3 conv 3 × 3 conv 1 × 1 conv 2 × 2 average pool, stride 2 1 × 1 conv 1 × 1 conv × 16 × 32 3 × 3 conv 3 × 3 conv 7 × 7 global average pool 1000D Fully-connected, softmax
The complete architecture of MobileNet-v2 consists of 17 building blocks, the structure of which is shown in Fig. 7b. The basic block of the network consists of three layers: “expansion layer”, “depthwise convolution” with ReLU6 activation, “projection layer” – with 1 × 1 convolution. These are followed by the usual 1 × 1 convolution, max pooling layer, and a classification layer. As you can see, the DenseNet and MobileNet architecture models have significant differences in their structure, at the same time, each family contains both a simpler and, accordingly, less resource-intensive network, as well as a more complex and more resource-intensive one, i.e. it is possible to choose the most optimal network based on the obtained learning outcomes. Thus, for the development of the DL CNN model based on transfer learning, the following DL CNNs were selected as pre-train models: DenseNet-121 and DenseNet-201, MobileNet-v1 and MobileNet-v2. CNNs of the selected architectures have both high learning ability and significant speed, which meets the require-ments for network training parameters when solving FER and ED recognition problems on stationary and mobile devices.
754
A. Olena et al.
Fig. 6. Structure of MobileNets Base Blocks: a – depthwise separable convolutions block (MobileNet-v1); b – bottleneck residual block (MobileNet-v2) [23]
4.3
DL CNN Model Development and Its Training Method for Solving FER and ED Problems
The DL CNN model for solving the human FER and ED problem was built on pretrained public DenseNet with numbers 121 and 201 and MobileNet v1 and v2. CNN versions of the selected architectures have both a high learning rate and a significant speed of operation. For each network, fully connected layers were removed at the output, instead of which, after the subsampling layer (GlobalAveragePooling), a prediction block AU (AU Predictedblock) was added, consisting of: a batch normalization layer BatchNormalization with β and γ – two generalizing variables for each feature [11], a layer with an activation function ReLU, as well as a new fully connected output layer with a sigmoidal activation function as an AU classifier. The structure of the model is shown in Fig. 7. The transfer learning method of the proposed DL CNN model consists of two stages: training the classifier on the target dataset and fine-tuning the pre-trained model CNN. Stage 1. Batch training of the classifier by the backpropagation method on the target dataset consists of the following step: 1. Initialization of the weight coefficients of the classifier with random values is carried out. 2. The target set of color images is fed to the input of the pre-trainedmodel CNN – a tensor of dimension m × m × 3 × N , where (m × m is the image size, N is the batch size).
Method of Transfer DL CNN for Automated RFE Systems
755
Fig. 7. DL CNN model for solving the human FER and ED problem based on transfer learning
3. (Direct pass): as a result of passing the pre-trained model CNN, feature maps of certain sizes are formed, (for example, 7 × 7) in the number L, which determines the size of the GlobalAveragePooling layer (for example, L = 1024). The GlobalAveragePooling output corresponds to the averaged value of each input feature map and has the form of an L × N matrix. BatchNormalization (BN) is performed for each row of the resulting matrix. At the output of the ReLU layer, only positive values of the coefficients remain (negative – are zeroed), which are fed to the fully connected classifier layer. The target values are used to calculate the average training error, which is averaged over the entire batch of size N . 4. (Backward pass): the weighting coefficients of the classifier are adjusted by the backpropagation method, taking into account the Dropout operation with a decimation factor of 0.2. For the BatchNormalization layer, β and γ coefficients are adjusted [11]. 5. When the retraining of the classifier is reached (the errors of the test and validation samples are tracked), the return pass is completed. Stage 2. Fine tuning is performed to improve the quality of the classification. This unfreezes all or part of the pre-trained model CNN coefficients, which are also corrected by the back propagation method with a low learning rate. Starting from point 2 of the main training method, all steps of the forward and backward pass are performed. The constructed DL CNN model and the method of its transfer training made it possible to retrain the last DLCNN layer using its own set of images in a reasonable time without changing the weight of other layers, providing the necessary reliability of AU recognition.
756
4.4
A. Olena et al.
Assessment of the AU Recognition Reliability Based on the Developed Model and Method
The Denver Intensity of Spontaneous Facial Action (DISFA) dataset was used as the initial images in a computer experiment to recognize AU on human face images. Examples of images from this set are shown in Fig. 8. It contains videos from 27 subjects – 12 women and 15 men, each of whom recorded a video of 4845 frames [19]. Each frame is annotated with AU intensity on a 6-point ordinal scale from 0 to 5, where 0 indicates no AU, while 5 corresponds to the maximum AU intensity.
Fig. 8. Sample images from DISFA dataset
Based on the analysis of previous works [25], an assumption was made about the presence of AU in the image, if its intensity is equal to 2 or more, and about its absence – otherwise. The frequency of occurrence of each AU among 130814 frames of the DISFA dataset is shown in Table 2. It should be noted that there is a serious problem of data imbalance, in which most AUs have a very low frequency of occurrence, while only a few other AUs have a higher frequency of occurrence. Table 2. Number of different types of AU in the DISFA dataset AU 1
2
4
6
9
12
25
26
No 6506 5644 19933 10327 5473 16851 36247 11533
The problem of class imbalance undoubtedly increases the difficulty of classification. When training a neural network, classes with a large number of images are naturally trained more often than classes with a small number of images. Due to unbalanced training, the classification abilities of neural networks on a test set can differ greatly from the results obtained on a training set with established class labels.
Method of Transfer DL CNN for Automated RFE Systems
757
The testing was carried out in the form of a subjective-exclusive three-fold cross-check on eight AUs, which determine the following states of motor activity of the muscles of the human face: AU1 – the inner parts of the eyebrows are raised; AU2 – the outer parts of the eyebrows are raised; AU4 – eyebrows dropped; AU6 – cheeks raised; AU9 – wrinkled nose; AU12 – the corners of the lips are raised; AU25 – lips parted; AU26 – jaw dropped. Another approach is to artificially increase the size of the datasets. This approach creates copies of the original images with altered lighting, rotations, mirroring, and other transformations applied. This approach can improve the reliability of neural networks by increasing the range of states in which source features are encountered during training. To increase the variability of the training sample, we used the on-the-fly augmentation technique - modification of the input data to artificially increase the size of the data set, which increases the reliability of CNN recognition by increasing the variants of facial expressions in which the object’s features are encountered during training. Data modification is performed directly in the learning process - every epoch and, helps to reduce the likelihood of over fitting the DLCNN. Initially, an area containing a face was selected from each frame. Each color image of a 224 × 224 × 3 face underwent transformations that were randomly selected from a list of possible affine transformations, including rotation, scaling, displacement, horizontal reflection, as well as adding Gaussian noise, changing saturation and brightness. In addition, the images were normalized according to the requirements for the input images from the used CNN. Convolutional network layers were initialized with pretrained weights on the ImageNet set, while fully connected layers were initialized with random values. Adam was used as an optimization algorithm with a learning rate of all networks of 0.00001. For the loss function, the Log-Sum-ExpPairwise (LSEP) function was chosen [17], which gives better results than weighted binary cross-entropy. LogSum-ExpPairwise function: exp(fν (xi ) − fu (xi )), (1) llsep = log(1 + ν∈Yi u∈Y / i
where f (x) is a label prediction function that maps an object vector x into a K-dimensional label space representing the confidence scores of each label, K is equal to the number of unique labels. One of the main properties of the function f (x) is that it must create a vector whose values for true labels Y are greater than for false labels / Y, fu (x) > fν (x), ∀u ∈ Y, v ∈
(2)
where fu (x) is the u-th confidence element for the i-th instance in the dataset, respectively. Yi is the corresponding label set for the i-th instance in the dataset. Due to the large imbalance in the data in the DISFA set, the recognition reliability was estimated by the value of the F1-measure (the harmonic mean of Precision and Recall indicators) as an average value – Avg.F1 for all AUs:
758
A. Olena et al.
P recision × Recall , (3) P recision + Recall TP , (4) P recision = TP + FP TP , (5) Recall = TP + FN where T P – true positive examples, F P – false positive examples, F N – false negative examples. The highest values of the F1-measure were shown by the models of DenseNet-201 and MobileNet-v1 networks (Table 3, Fig. 9). At the same time, the networks MobileNet-v2 and MobileNet-v1 have the smallest number of trained parameters: 2×106 and 3×106 respectively. The networks DenseNet-201 (18×106 parameters) and DenseNet-121 (7×106 parameters) are more resourceintensive. A comparative analysis of the results of other authors’ work on solving human FER and ED problems [16,20,24] and the considered public DLCNNs showed that using the Log-Sum-ExpPairwise loss function [17] and taking into account the number of trained parameters, the MobileNet-v1 and DenseNet- 201 convolutional networks give the best results on the DISFA dataset [19]. F1 = 2
Table 3. Comparison of AU recognition reliability estimates for modern special DLCNN models and proposed solutions Estimation
Special DLCNN [16, 20, 24]
quality
EACNet LPNet JAANet DenseNet- DenseNet- MobileNet- MobileNet-
Avg.F1,%
48,5
56,9
63,5
Transfer learning method of public CNN 121
201
v1
v2
61,2
62,8
62,3
60,4
Fig. 9. Diagram of comparative estimates of the AU recognition quality when solving the FER problem using CNN for the DISFA dataset
Method of Transfer DL CNN for Automated RFE Systems
5
759
Conclusions
To summarize we can say that the use of pretrained public DL CNNs with the subsequent “fine tuning” of the network parameters made it possible to obtain a reduction in training time and computing resources for solving human FER and ED problems using various motor units of human facial activity without reducing the recognition quality indicators. Thus, the proposed DL CNN model for solving the FER and ED problem of a person based on transfer learning of public CNNs of the DenseNet and MobileNet families and the training method for this model can be applied to solving practical problems requiring the construction of systems for recognizing facial expressions (facial expressions) of a person and determining emotions both for stationary and mobile devices.
References 1. Almaev, T., Martinez, B., Valstar, M.: Learning to transfer: transferring latent task structures and its application to person-specific facial action unit detection. In: 2015 IEEE International Conference on Computer Vision (ICCV 2015), pp. 3774–3782 (2015). https://doi.org/10.1109/ICCV.2015.430 2. Almaev, T., Valstar, M.: Local gabor binary patterns from three orthogonal planes for automatic facial expression recognition. In: 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (2013). https://doi.org/ 10.1109/ACII.2013.65 3. Arsirii, O., Antoshchuk, S., Babilunha, O., Manikaeva, O., Nikolenko, A.: Intellectual information technology of analysis of weakly-structured multi-dimensional data of sociological research. In: Lytvynenko, V., Babichev, S., W´ ojcik, W., Vynokurova, O., Vyshemyrskaya, S., Radetskaya, S. (eds.) Lecture Notes in Computational Intelligence and Decision Making, pp. 242–258. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-26474-1 18 4. Baltrusaitis, T., Mahmoud, M., Robinson, P.: Cross-dataset learning and personspecific normalisation for automatic action unit detection. In: 2015 IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, vol. 6, pp. 1–6 (2015). https://doi.org/10.1109/FG.2015.7284869 5. Bianco, S., Cadene, R., Celona, L., Napoletano, P.: Benchmark analysis of representative deep neural network architectures. IEEE Access 6, 64270–64277 (2018). https://doi.org/10.1109/ACCESS.2018.2877890 6. Ekman, P., Friesen, W.: Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press (1978) 7. Ekman, P., Friesen, W.V., Hager, J.C.: Facial action coding system (facs). A Human Face (2002). https://ci.nii.ac.jp/naid/10025007347/en/ 8. Howard, A., et al.: Mobilenets: eficient convolutional neural networks for mobile vision applications (2017). https://arxiv.org/abs/1704.04861 9. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017). https://doi.org/10.1109/CVPR.2017. 243 10. ImageNet: Imagenet overview https://image-net.org/about
760
A. Olena et al.
11. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 448–456. PMLR, Lille, France, 07–09 Jul 2015. http://proceedings.mlr.press/v37/ioffe15.html 12. Jaiswal, S., Valstar, M.: Deep learning the dynamic appearance and shape of facial action units. In: 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1–8 (2016). https://doi.org/10.1109/WACV.2016.7477625 13. Jiang, B., Valstar, M.F., Pantic, M.: Action unit detection using sparse appearance descriptors in space-time video volumes. In: 2011 IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, pp. 314–321 (2011). https://doi.org/10.1109/FG.2011.5771416 14. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989) 15. Li, S., Deng, W.: Deep facial expression recognition: asurvey. IEEE Trans. Affective Comput., 1 (2018). https://doi.org/10.1109/taffc.2020.2981446 16. Li, W., Abtahi, F., Zhu, Z., Yin, L.: Eac-net: deep nets with enhancing and cropping for facial action unit detection. IEEE Trans. Pattern Anal. Mach. Intell. 40(11), 2583–2596 (2018). https://doi.org/10.1109/tpami.2018.2791608 17. Li, Y., Song, Y., Luo, J.: Improving pairwise ranking for multi-label image classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1837–1845 (2017). https://doi.org/10.1109/CVPR.2017.199 18. Lim, Y., Liao, Z., Petridis, S., Pantic, M.: Transfer learning for action unit recognition. ArXiv (2018) http://arxiv.org/abs/1807.07556v1 19. Mavadati, S., Mahoor, H., Bartlett, K., Trinh, P., Cohn, J.: Disfa: a spontaneous facial action intensity database. 2013 IEEE Trans. Affective Comput. 4(2), 151–160 (2013). https://doi.org/10.1109/T-AFFC.2013.4 20. Niu, X., Han, H., Yang, S., Huang, Y., Shan, S.: Local relationship learning with person-specific shape regularization for facial action unit detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11909–11918 (2019). https://doi.org/10.1109/CVPR.2019.01219 21. Ntinou, I., Sanchez, E., Bulat, A., Valstar, M., Tzimiropoulos, G.: A transfer learning approach to heatmap regression for action unit intensity estimation (2020). https://arxiv.org/abs/2004.06657 22. Samadiani, N., Huang, G., Cai, B., Luo, W., Chi, C., Xiang, Y., He, J.: A review on automatic facial expression recognition systems assisted by multimodal sensor data. Sensors 19(8), 1863 (2019). https://doi.org/10.3390/s19081863 23. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018). https://doi.org/10. 1109/CVPR.2018.00474 24. Shao, Z., Liu, Z., Cai, J., Ma, L.: Jaa-Net: joint facial action unit detection and face alignment via adaptive attention. Int. J. Comput. Vision, 1–20 (2020). https://doi. org/10.1007/s11263-020-01378-z 25. Shao, Z., Liu, Z., Cai, J., Wu, Y., Ma, L.: Facial action unit detection using attention and relation learning. IEEE Trans. Affective Comput., 1 (2019). https://doi. org/10.1109/taffc.2019.2948635
Method of Transfer DL CNN for Automated RFE Systems
761
26. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1409.1556 27. University of California: EMFACS-7: Emotional Facial Action Coding System. Unpublished manual (1983) 28. Valstar, M., Pantic, M.: Fully automatic facial action unit detection and temporal analysis. In: 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW 2006) pp. 149–149. IEEE (2006). https://doi.org/10.1109/ CVPRW.2006.85 29. Walecki, R., Rudovic, O., Pavlovic, V., Schuller, B., Pantic, M.: Deep structured learning for facial action unit intensity estimation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5709–5718 (2017). https:// doi.org/10.1109/CVPR.2017.605 30. Zhang, Z., Zhai, S., Yin, L.: Identity-based adversarial training of deep CNNs for facial action unit recognition. In: British Machine Vision Conference 2018. p. 226. BMVA Press (2018). http://www.bmva.org/bmvc/2018/contents/papers/ 0741.pdf 31. Zhao, K., Chu, W.S., De la Torre, F., Cohn, J.F., Zhang, H.: Joint patch and multi-label learning for facial action unit detection. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2207–2216 (2015). https:// doi.org/10.1109/CVPR.2015.7298833 32. Zhou, Y., Pi, J., Shi, B.E.: Pose-independent facial action unit intensity regression based on multi-task deep transfer learning. In: 2017 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017), pp. 872–877 (2017). https://doi.org/10.1109/FG.2017.112
Development of a Smart Education System for Analysis and Prediction of Students’ Academic Performance Svetlana Yaremko1 , Elena Kuzmina1(B) , Nataliia Savina2 , Dmitriy Yaremko3 , Vladyslav Kuzmin3 , and Oksana Adler3 1
2
Vinnitsa Institute of Trade and Economics Kiev National Trade and Economic University, Vinnytsia, Ukraine svitlana [email protected] The National University of Water and Environmental Engineering, Rivne, Ukraine [email protected] 3 Vinnytsia National Technical University, Vinnytsia, Ukraine
Abstract. The aim of this article is to research the key characteristics of the smart education system, prepare some requirements for its main components, and develop a set of tasks in order to ensure the higher level of education, communication, analysis and prediction of students’ academic performance. Based on the research of the smart education system main components, there was built an account interface for a student to login into this system using web development tools. Moreover, to implement this smart education system, there was recommended to use software and hardware system that will allow quickly to access educational materials, smart simulators, interactive personal assistant, virtual boards, and other educational tools. Overall, the development of the smart education system will provide the higher-quality learning content, make the educational process more individualized, help to keep track of changes in the academic performance, provide recommendations for its improvement, as well as improve the knowledge quality for the future specialists. Keywords: Smart systems and technologies · Educational programs Smart simulators · Interactive personal assistant · Virtual board · Expert system
1
·
Introduction
The traditional teaching methods available today do not provide a full-fledged cognitive process, primarily because it is difficult for the learning organization to establish feedback, self-control, and effective management of the learning process. At the same time, it should be noted that the learning process requires intense mental work of the person and his own active participation in this process. Learning is a very complex process and in fact how people learn, is not fully understood c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 762–775, 2022. https://doi.org/10.1007/978-3-030-82014-5_52
Development of a Smart Education System
763
till the recent time. We know unfailingly that the learning efficiency varies significantly from individual to individual. For more than 6000 years the human civilization learned in some groups. Personal computers do not only support automation of cognitive processes but also enable computer-supported group personalized and learning. ITSs became gradually complex software systems and the reusability of their components plays crucial role in their sustainability and further evolution [19,20]. Explanations and demonstrations, by themselves, will never provide genuine, lasting knowledge. Only by enhancing the cognitive activity of students on the basis of the latest information technologies can increase the level of learning. This, in turn, necessitates the introduction of intelligent technologies that improve the quality of knowledge and skills of future professionals [21]. The current direction of research is the development of intelligent educational systems that can be used to ensure a quality regime of dialogue, bringing this interaction to communicate with a real teacher. This leads to customization of education for users. Intelligent next-generation learning system can also form a new model of interaction and develop the most optimal way of learning, taking into account its behavioral characteristics and personality traits.
2
Review of Literature
Currently, a considerable number of systems for organizing and managing learning are created, among them Prometheus, GetCourse, iSpring Learn, Moodle, eTutorium LMS, Teachbase, Memberlux and others [1–10,13]. Such systems will mainly use different information technologies, but it should be emphasized information, while the need for knowledge technologies is becoming increasingly important. Despite its long history and significant achievements in the field, however, the challenge is to present knowledge specifically for education, since pedagogical presentation often does not fit into the general advice and classic statement of the problem of knowledge representation in artificial intelligence [12,20]. This problem can be solved by developing an intelligent education system that would allow students to receive quality education individually using the latest teaching methods and tools, as well as tools of modern information technologies and systems. In the first stage of creating an intellectual education system, it is planned to formulate requirements for its characteristics and develop a scheme of complex tasks that it will accomplish. First of all, it should be noted that the intelligent training system should provide a high-level dialogue with the user, which will allow the user to promptly receive user requests, process them effectively, and provide recommendations online. Besides, a large scientific field has been created in the intellectual scientific system, which contains both objects existing in this field, and at this time there are special spheres and subjective specialists who accumulate in their teaching methods and especially important concrete ones expert teachers.
764
S. Yaremko et al.
Among the intelligent learning systems the most famous are the physics teacher of the Andes, Prolog and others [19]. Consider the basic principles on which they are implemented. The most general way to describe ITS, is to say that it is the application of AI to the education. During the several last decades the penetration of computers essentially influenced the architectures of the so-called ‘intelligent tutoring’ systems. It is fashionable recently to mark sophisticated software systems with this attribute. The definition of intelligence is context dependent and we will not deal with the phenomenon of intelligence. The aim of this paragraph is to present several definitions of the ITSs. ‘ITSs are computer software systems that seek to mimic the methods and dialog of natural human tutors, to generate instructional interactions in real time and on demand, as required by individual students. Implementations of ITSs incorporate computational mechanisms and knowledge representations in the fields of artificial intelligence, computational linguistics, and cognitive science’ [20]. Broadly defined, an intelligent tutoring system is educational software containing an artificial intelligence component. The software tracks students’ work, tailoring feedback and hints along the way. By collecting information on a particular student’s performance, the software can make inferences about strengths and weaknesses, and can suggest additional work’ [11,12]. ‘In particular, ITSs are computer-based learning systems which attempt to adapt to the needs of learners and are therefore the only such systems which attempt to ‘care’ about learners in that sense. Also, ITS research is the only part of the general IT and education field which has as its scientific goal to make computationally precise and explicit forms of educational, psychological and social knowledge which are often left implicit’ [19]. It is generally accepted (as it is seen also from the above mentioned definitions) to refer to an ITS if the system is able to: build a more or less sophisticated model of cognitive processes; adapt these processes consecutively and control a question-answer-interaction [11,19]. Conventionally, ITS provides individualized tutoring or instruction and has the following 4 models or software components [4,5,13,19,20]: – knowledge of the domain (i.e. knowledge of the domain expert, refers to the topic or curriculum being taught); – knowledge of the learner (e.g., what he/she knows, what he/she’s done, how he/she learns, . . . ); – knowledge of teacher strategies (or pedagogical issues i.e. how to teach, in what order, typical mistakes and remediation, typical questions a student might ask, hints one might offer a student who is stuck); user interface (i.e. interactive environment interface). It should also be noted that the standards of training Technology (LT) will have a powerful impact on the way education works the near future. Whether learning takes place in the classroom or online, the relationship between teachers,
Development of a Smart Education System
765
students and teaching materials will be very significant under the influence of the development of standards of learning technologies. IN engineering intelligent learning systems play an important role in the concept of reusability and standards [4,5,11–13,19,20]. For educators, LT standards may make it easier to share course materials with colleagues, and to use materials produced by a much wider range of publishers without worrying about those materials being incompatible with their existing course management software. On the other side, the applications that are developed based on LT standards will also influence the way in which they teach. For students, it may provide the ability to move between institutions, anywhere in the world, with far greater ease than it is currently possible, taking their academic record with them. The key issue is what this record contains, and who has access to it, and this will largely depend on how the standards for learner profiles are defined [16,17,19]. For institutions, there are clear benefits from connecting up systems for academic records, course delivery, and assessment. Provided that those standards are adopted by the key vendors of those systems, and the standards are framed in such a way that the activities and values of the institution are supported. If the majority of vendors adopt the standards and those standards don’t suit education institutions, then the lack of alternative systems available on the market will either force institutions into changing their practices or purchasing expensive proprietary solutions [11,15]. For vendors the adoption of open standards widens the playing field, allowing small and medium sized companies to create education solutions that are compatible with other compliant systems. For example, any vendor could create a Virtual Learning Environment (VLE) that can deliver the same course materials as the market-leading products. This in turn leads to greater choice for institutions, educators, and students. For publishers standards mean reduced costs and time to market, as content does not need to be developed for multiple VLE platforms [19].
3
Initial Prerequisites and Problem Statement
The purpose of the scientific article is to develop an intelligent educational system based on modeling, development of task diagrams and block diagrams of hardware and software, as well as the interface of the student’s electronic office, which will allow: – provide individual planning for the methodology of study courses; – assess each student’s activity in chat rooms, groups, and online educational and professional courses; – analyze the success of each student’s studies in different subjects and develop individual recommendations for improving the learning of the educational material;
766
S. Yaremko et al.
– optimize the work of teachers in assessing student success by introducing an electronic accounting system using modern software; – display all the data of analysis and forecasting of success of each student in the individual e-mail room; – to increase overall motivation and interest in learning through new forms of organization and the ability to individualize the learning process. Scientific novelty of this research is to develop and adapt existing mental models and methods in the proposed intelligent educational system which will allow to adapt the learning process for each student based on his individual profile of knowledge, personal characteristics and abilities: the ability to observe the level of assimilation of knowledge; develop individual recommendations to improve the level of success and in general will improve the quality of educational services.
4
Materials and Methods
The methods of analysis and synthesis were used in writing the article to formulate requirements for the characteristics of the developed intellectual system and the complexity of tasks that it will implement, as well as in the development of requirements for the hardware-software for the construction of the intellectual system. In addition, the Student Model and the Learning Process Model were used to model the work of the intellectual learning system, which are the basics for individual planning of training courses, intellectual analysis of decision-making tasks and intellectual support for decision-making. When building an intelligent learning system, the Student model (M1) was used, which can be displayed as a dependence (1): M1 = F (AI, P P, CF K, CT C, LKS)
(1)
where AI is the accounting information about the student (surname, number of a study group, date of control event, etc.); P P is the psychological portrait in the form of a set of personal characteristics; CF K is the current and final levels of knowledge and skills; CT C is the information on the student’s current and target competencies; LKS is the information about methods and algorithms for identifying levels of knowledge and skills of the student and used algorithms for psychological testing, based on which a psychological portrait is formed. Another basic model, the Learning Model (M2), can be represented by expression (2): M2 = F ({M 11 , M 12 , . . . , M 1n , } , {P I1 , P I2 , . . . , P In , } , F (LS))
(2)
where {M 11 , M 12 , . . . , M 1n , } , is the the set of M1 models; {P I1 , P I2 , . . . , P In , } is the set of learning strategies and plans and learning influences; F (LS) is the function of selecting and/or generating learning strategies depending on the input to the M1 model, and the learning management is based on some plan (strategy), consisting of a certain sequence of educational influences.
Development of a Smart Education System
767
The combination of these models allowed to create an effective intelligent system for analyzing and predicting student performance. Neural network models were also used [14,15] for the optimization problem of analyzing student test results σ ∗ = min(HLN, Nk , W, Fj (X, W )), max σ(Q), where σ = Y − Y a set of educational examples containing the values of X, Y ; Y = F (HLN, Nk , X, W ), F (HLN, Nk , X, W ) - the transfer function of the INS, which is based on the private functions of individual neurons Fj (X, W ) to test students’ knowledge based on testing. In this case, the input elements for the neural network will be issues that will be clustered in order of importance. By setting the appropriate weights, it is possible to get more accurate information about the level of knowledge of each student. This creates a profile of his knowledge, which makes it possible to correctly identify those gaps in knowledge that need to be addressed. Therefore, the learning process will be customized for each individual student, which will improve the quality of his training.
5
Experiment, Results and Discussion
Given the greater demands, some of the tasks facing the intellectually learned system need to be accomplished. The main tasks that will be solved by the intellectual educational system are the demonstration of key topics, the development of skills for solving typical and specialized tasks, the creation of a virtual board for the simultaneous learning of students and, of course, the individualization of the educational process for students receiving distance education. Key topics will include videos or presentations that students will watch in class and will be able to view the materials remotely. This will allow students to visually anchor the topics of the course, which will greatly enhance the perception of theoretical information and help it to better assimilate and memorize. It is also advisable to place electronic textbooks and manuals in the system. To provide an adequate and high-quality level of education for distance learning students, the use of web technologies and expert systems should be maximized, as this will ensure that materials and tasks are complete. Such groups of students will at the level receive the knowledge and ability to apply them, as well as students of full-time study. To develop students’ problem-solving skills and consolidate lecture materials, it is necessary to fully implement and use intelligent training simulators, as well as semantic models of academic disciplines. Thanks to this, it is possible to get feedback from students if any of the topics are not fully understood by them or practical tasks cause them difficulty in finding solutions. Solving typical and specialized tasks will help students consolidate theoretical materials and fully develop theory skills in practice. For this purpose it is necessary to develop for each discipline their educational computer complexes, which in turn will be divided into theoretical, problem-theoretical and explanatory modules. The student will be able to apply to any of the modules in the
768
S. Yaremko et al.
course of solving problems, which will improve the mastering of the material and its application in practice. An urgent need to develop an intelligent learning system is to create a virtual board using the Internet for simultaneous learning and control. In this way, students will be able to practice their knowledge and skills from previously learned materials, and teachers, in turn, will be able to quickly control the completion of certain tasks, and, if necessary, to draw students’ attention to their underdeveloped topics. Another important aspect of this aspect is that, with the help of a virtual board, students from different educational institutions, regions and even countries will be able to cooperate for joint scientific conferences, which will significantly increase the level of development in preparing students for their professional activities. Therefore, in summary, it can be noted that the certified educational material processed and uploaded to the intelligent training system and the related content will be available online for display by different methods and on different media. And last but not least, in the different configuration and in the content and volume of each student, which will allow to realize the whole variety of tasks of the intellectual educational system, the generalized scheme of the complex of tasks which is presented on Fig. 1.
Fig. 1. Scheme of the complex of task of the intellectual learning system
Development of a Smart Education System
769
In the second stage, modeling and development of an intelligent learning system. Given the above, we will build an intelligent learning system based on “Student Models” and “Learning Process Models”, which contain the components presented in Table 1. Table 1. Components of the intellectual learning system Model student
Model of the learning process
Student information block Forms of information submission Learning strategy
Type of student activity quality assessment
Error analysis block
The process of student training Final control
The models presented in Fig. 1 are basic for the problems of intellectual learning, namely: individual planning of the methodology of study of training courses, intellectual analysis of decisions of educational tasks and intellectual support of decision-making [11,19]. To date, one of the most tested in the educational process is a fairly flexible form of the student model M1, which includes accounting information about the student (name, number of the study group, the date of the control event, etc.); psychological portrait in the form of a set of personal characteristics; current and final levels of knowledge and skills; information about the current and target competence of the student; information on methods and algorithms for identifying the levels of knowledge and skills of the student and the algorithms of psychological testing used, on the basis of which a psychological portrait is formed, the mathematical expression of which is given in (1). Knowledge discovery processes are implemented as a rule, when carrying out control activities by dynamically forming the current competency-oriented model of the student, which is based on the analysis of answers to questions from special web tests and subsequent comparison with fragments of applied ontology of the course/discipline. The generation of variants of test tasks is carried out before the beginning of web testing by applying the genetic algorithm to a specific ontology of a course/discipline or its fragment [19,20]. Experience has shown that the technologically chosen approach to web testing has proved to be quite versatile and flexible since it allows the generation of test tasks that meet any criteria since the software implementation of the algorithm is separate and in no way related to the chosen objective function. Therefore, the teacher is not limited in the choice of criteria, that is, he can use both simple statistical measures (mathematical expectation and variance of the complexity of the questions), and more complex (the closeness of the distribution of complexity to the set). The method also does not depend on the model of the ontology and retains its efficiency when introducing any new parameters (changing the
770
S. Yaremko et al.
complexity of the question, accounting for the result of other students’ answers to this question, etc.) [16,17]. It should also be noted that at present when generating variants of test tasks based on the genetic algorithm, the following methodological requirements are supported: the obtained variants of web tests should cover a given fragment of the ontology of the discipline, and in each of them there can be no identical questions; the number of questions in the different options should be appropriate and the overall complexity of each option approximately equal; so-called “bush issues” cannot be in one embodiment. As for the methodology of knowledge assessment, it is based on the calculation of the resulting evaluation for the full test [11,12,14]. Another basic model, the Adaptive Learning Model (M2), includes knowledge of the planning and organization (design) of the learning process, as well as general and private teaching methods appropriate to individual student models [11]. The mathematical representation of this model is given in (2). Currently, the most sought after educational effects are: fragments of hypertext electronic textbooks, as well as a set of training tasks [11,12,19]. In addition to the knowledge model, the student model should also store information about his/her activity in the system, as this allows faster to achieve positive results. Statistics on the activity of the system (time, dates and duration of lectures, testing, etc.) can be used to assess the honesty, responsibility, independence of work and the rate of assimilation of material by students, which, in turn, can be used by the recommendation subsystem. To implement the above tasks, it is advisable to use neural network tools. Neural networks are groups of mathematical algorithms that have the ability to learn by example, “recognizing” later the features of the encountered patterns and situations. A neural network is a collection of a large number of relatively simple elements - neurons, the topology of which depends on the type of network. To create a neural network to solve any particular problem, you must choose the method of connecting neurons to each other and choose the values of the parameters of interneuronal connections. In systems based on precedents, the database contains descriptions of specific situations (precedents). The search for a solution is based on analogies and includes the following stages: obtaining information about the current problem; comparison of the received information with values of signs of precedents from knowledge base; choosing a precedent from the knowledge base closest to the problem; adaptation of the chosen precedent to the current problem; checking the correctness of each decision received; entering detailed information about the received decision in BZ. Precedents are described by many features on which to build fast search indexes. However, in precedent-based systems, in contrast to inductive systems, a fuzzy search is allowed to obtain many valid alternatives, each of which is evaluated by a certain coefficient of confidence. The most effective solutions are adapted to real situations with the help of special algorithms. Precedent-based systems are also used to disseminate knowledge in context-based assistance systems. In particular, to test students’ knowledge through testing, the input elements for the neural net-
Development of a Smart Education System
771
work will be questions that will be clustered in order of importance. By setting the appropriate weights, it is possible to get more accurate information about the level of knowledge of each student. This creates a profile of his knowledge, which makes it possible to correctly identify those gaps in knowledge that need to be addressed. Therefore, the learning process will be customized for each individual student, which will improve the quality of his training [3,14,15]. We formulate a general model of neural network optimization for the implementation of the above areas of implementation of the educational process. Thus, for a given number of input and output neurons based on a given set of training examples, it is necessary to determine [14,15]: optimal values HLN of the number of hidden layers of HLN neurons: the number of neurons in each layer Nk , k = l, . . . , HLN ; the value of all weights of interneuronal connections wij , where j is the index of the neuron; i - index of interneuronal connection (synapse); Fj (X, W ) - transfer functions of all neurons except the neurons of the input layer [14,15]. The optimization criterion will be the maximum deviation of the output network vector Y from the reference value of the output Y , obtained by processing all examples, ie it is necessary to find σ ∗ = min(HLN, Nk , W, Fj (X, W )), max σ(Q), where σ = Y − Y a set of educational examples containing the values of X, Y ; Y = F (HLN, Nk , X, W ), F (HLN, Nk , X, W ) - the transfer function, which is based on the private functions of individual neurons Fj (X, W ). Even for simple networks, this problem is very complex, so decomposition is used to solve it, ie the network is optimized in the process of sequential solution of private optimization problems. For example, in the first step the optimal values of HLN and Nk are selected, then the optimal type of transfer functions of neurons is determined, and at the final stage the weights of interneuronal connections are selected [14,15,18]. The model of management of the educational process allows us to determine at what moments it is necessary to include management of the process of learning. There is also a need to identify ways to influence the user of the intellectual learning system. The first step is to carefully approach the selection of training material (including theoretical material, tasks to solve, test tasks) to take into account the knowledge and skills of the user of the intellectual system, fixed in a specific model. The second step is to interactively support the process of solving the problem in the intellectual learning system. The developed algorithm of management will be able to select such criteria as the time of appearance and the frequency of actions of the smart program. This can be implemented in the form of an interactive assistant with various summary functions. For example, such a helper may give small text prompts for the next step in the predicted step of a particular task, or provide links to theoretical material for the task being solved. The online assistant can guide how to handle current tasks or how to handle the system as a whole. The learning process management model also contains components for controlling students’ knowledge and assessing the success of learning.
772
S. Yaremko et al.
Based on the modeling of the intellectual educational system and the description of its main components, the interface of the electronic study room of the student of the intellectual educational system was implemented by such webprogramming tools as HTML5, CSS5, JavaScript, PHP MySQL and others (Fig. 2).
Fig. 2. The interface of the electronic study room of the student of the intellectual educational system
Today, a large number of hardware and software tools are used to improve the quality of knowledge delivery in educational institutions. However, to provide an effective cognitive activity for students, the tools to be used in the intellectual learning system must meet the following requirements: – development and presentation of educational materials using audio and video files; – the ability to display on the virtual board the image of the PC monitor, which any student works for; – provision of collective forms of work during business games, trainings, projects on local and global networks; – providing feedback to students, faculty and the expert system through online polling or voting; – the ability to work with documents, spreadsheets or images (including via conference call); – the ability to save the results of the session with the records of the teacher and students on the virtual board and print these results. After forming the requirements for the functional capabilities of the intellectual training system, it is possible to carry out the development of a structural diagram of the complex of its hardware and software. A structurally intelligent learning system can be housed in educational classrooms and consist of components such as multimedia systems, webcams, wifi
Development of a Smart Education System
773
routers, network printers, as well as the automated workplaces of teachers and students incorporating a PC with an application complex installed software for entering, editing, presenting training material, as well as automated polling and voting. Structural diagram of a complex of hardware and software developed an intellectual learning system, based on an integrated approach to the choice of means for input, transmission, display of educational material, as well as adjusting and checking its perception and assimilation, presented in Fig. 3.
Fig. 3. Structural diagram of a complex of hardware and software intellectual training system
Besides, for all computers in the classroom to connect locally, it is planned to place the account information, not on each computer but the domain controller. This is the so-called network logon. In these circumstances, it becomes possible for each account to move a profile, the network path to which will be stored on the domain controller. As a result, users can work from any domain computer with their Desktop, My Documents, and other customizable items. Thus, the use of a domain model of the network will allow us to perform all administrative work centrally. Thus, the developed structure will have the advanced equipment and the best opportunities due to the introduction of the necessary equipment and the creation of automated training places for students, which will install the necessary hardware and software for the ability to view, supplement, edit the educational material and conduct voting and automated polls, that will increase the students’ cognitive activity, the level of mastering their educational material and will allow them to work more effectively on their plowing collective projects.
774
S. Yaremko et al.
Also, the development and installation of software such as expert systems and systems for online conferences and chats, as well as smart simulators and virtual boards will provide effective individual and team-based learning and, in general, improve the quality of educational services. The final stage will be the development of methodological support, instructional materials and training of teachers and employees of the department of technical means for use in the educational process of the intellectual educational system.
6
Conclusions
The cost-effective effect will be to reduce the cost of printing methodological training materials for students who are using an intelligent, electronic-based learning system while declining while teaching results, but this can be automated automatically. Also test experimental systems that attempt to analyze the performance of each student in a specific subject area and provide guidance on how to improve their use, using the same quality services used by professional services. Besides, using the opportunity to use laboratories for online training, webinars, video conferences for third parties, which added additional sources of funding for universities. After that an intellectual scientific system was developed, methods of methodological extension and teaching of teachers were developed, its active use in educational activity and different directions of the university was planned. Also, it is planned to increase the number of practitioners more widely, by entrusting business executives to sending out applied tasks, to justify the situation and to create the basis of expert knowledge, which requires the formation of a structure that is the most diverse scientific theory and offers up-to-date assessments of condemned educational issues. From the modern perspective, it is possible to create an educational computer that makes it dangerous to create modern theoretical knowledge and practical skills that create the ideal technology for work and permanent residents, which can be used by a source of international planning for higher education. Thus, the intelligent training system created provided the basis for the creation of new forms and methods, which were used in service services and which were generally achieved to achieve the highest quality in training.
References 1. 2. 3. 4. 5.
Academy Ocean. https://academyocean.com/employee-training/ru eTutorium LMS. https://etutorium.ru/about-us Getcourse https://getcourse.ru/ Intelligent tutoring. http://www.adlnet.gov/technologies/tutoring/index.cfm Intelligent Tutoring Systems (a subtopic of education). http://www.aaai.org/ AITopics/html/tutor.html
Development of a Smart Education System 6. 7. 8. 9. 10. 11.
12.
13. 14. 15. 16.
17.
18. 19. 20.
21.
775
Ispring Learn. https://www.ispring.ru/ispring-learn Moodle. https://moodle.org/?lang=uk Prometheus. https://prometheus.org.ua/ Teachbase https://teachbase.ru/ The Learning Object Metadata Standard. http://lttf.ieee.org/techstds.htm Bhaskaran, S., Marappan, R., Santhi, B.: Design and analysis of a cluster-based intelligent hybrid recommendation system for e-learning applications. Mathematics 9 (2021). https://doi.org/10.3390/math9020197 Fardinpour, A., Pedram, M., Burkle, M.: Intelligent learning management systems: definition, features and measurement of intelligence. Int. J. Dist. Educ. Technol. 12(4), 19–31 (2014). https://doi.org/10.4018/ijdet.2014100102 Gromov, Y.Y., Ivanova, O.G., Alekseev, V.V., et al.: Intelligent information systems and technologies. FGBOU VPO “TSTU”, Tambov (2013) Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice Hall, Upper Saddle River (2006) Haykin, S.: Neural Networks and Learning Machines. Prentice Hall, Hamilton (2009) Li, H., Li, H., Zhang, S., Zhong, Z., Cheng, J.: Intelligent learning system based on personalized recommendation technology. Neural Comput. Appl. 31(9), 4455–4462 (2018). https://doi.org/10.1007/s00521-018-3510-5 Klaˇsnja-Mili´cevi´c, A., Ivanovi´c, M., Vesin, B., Budimac, Z.: Enhancing e-learning systems with personalized recommendation based on collaborative tagging techniques. Appl. Intell. 48(6), 1519–1535 (2017). https://doi.org/10.1007/s10489-0171051-8 Kruglov, V.V., Borisov, V.V.: Neural Networks: Theory and Practice. Hotline – Telecom, Moscow (2002) Tsvetkova, T.: Professional development of the teacher in the light of European integration processes: collective monograph. InterGING, Hameln (2019) Vinod Chandra, S., Anand Hareendran, S.: Artificial Intelligence: Principles and Applications. PHI Learning Private Limited, Delhi (2020). http://books.rediff. com/publisher/phi-learning-pvt-ltd-new-delhi Yaremko, S., Nikolina, I., Kuzmina, E., et al.: Model of integral assessment of innovation implementation it higher educational establishments. Int. J. Electron. Telecommun. 66(3), 417–423 (2020). https://doi.org/10.24425/ijet.2020.131894
Assesment Model for Domain Specific Programming Language Design Oleksandr Ocheretianyi(B)
and Ighor Baklan
National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, Kyiv, Ukraine [email protected]
Abstract. In this paper, we present the results of research in constructing criteria for qualitative assessment of domain specific languages. Proposed compound metric consists of three parts: classic grammar size metrics, for evaluation programming language as a software and for evaluating conciseness. Each part of compound metric evaluates different characteristics needed for assessment of modern domain specific languages. To process evaluation automated software was developed. Research also contains explanation of marks for each component and their influence on the quality of accessing programming language. The analysis of assessment results provided required information for further improvements which will be presented as a part of system for computer-aided design of domain specific languages. Achieved compound metric could be used in assessment of domain specific programming languages of any types and can eliminate human provided mistakes in some areas of design. Keywords: Assessment of domain specific languages · Grammar-ware · Software metrics · Language-oriented programming
1
Introduction
The problem of quality assessment of software is researched during the whole evolution process in computer science. The first computer programs went through various stages of coding from program design by a programmer to the final result, which took more than one day at the beginning of development. For decades, the distance between the idea of the program and getting the results of its work has decreased many times. However, the number of errors that can occur during the software development process has only increased. The main tools of software development for programmers are programming languages. Such a tool is usually designed more than one year and has a long path of development and formation that entails errors in design at the beginning of the programming language itself. First generation programming languages that c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 776–788, 2022. https://doi.org/10.1007/978-3-030-82014-5_53
Assesment Model for DSL Design
777
stated low-level machine instructions simply could not had problems with design due to small amount of instructions available on computer. Domain-specific language are on the top of abstraction level of programming and belong to fifth generation. The fifth generation of programming languages aims to focus on the declarative description of the program, so the problem of errors that can be made by the designer in the process of creating a programming language plays a significant role in the development of such languages. Typically, software engineers consider programming languages exclusively in the format of a tool to implement the algorithm, but modern developments increasingly require an assessment of the quality of programming languages in terms of software. The quality of designing a programming language directly affects the number of errors that may occur in the process of declarative expression of opinions of the programmer who will use it. Currently, there is a very limited number of studies that study the quality of programming language design. However, with the development of language workbenches, the number of domainoriented programming languages only increase. This increase in the number of programming languages leads software architects to the problem of choosing the right programming language to use in a particular task, but these experts have no opportunity to assess the quality of language as a domain defining tool and not as a tool to describe business logic. Furthermore domain specific languages are created for use by domain specialists. Often this experts do not have any previous experience in programming, therefore they expect from language to act as a domain, but not as a programming tool. On the other side programmers who create programming languages tend to over promote modern solutions in programming that can be dispensable to domain. None of the tools created already have functionality to sustain balance in this confrontation. Metric to assess domain specific languages should cover all mentioned problems that arise during the design, implementing, testing and deployment stages. Errors during each stage will influence usability and quality of designed language and must be caught as early as possible with metrics proposed in this research.
2
Problem Statement
The problem under research is related to the construction of Intelligent system for building problem-oriented programming languages based on the principles of multi paradigm. As part of the development of an intelligent system, a new scheme for designing problem-oriented programming languages was introduced. This scheme involves: collecting data on the domain language in the domain description using ontologies, converting the anthology into a domain language grammar in antlr format, creating a translator to convert program code in the domain language into constructs of program paradigms, creating a translator to convert constructs of program paradigms into code source programming language. Since each of these stages may have its own errors that will lead to the malfunction of the translator of the programming language, it is necessary to
778
O. Ocheretianyi and I. Baklan
develop a comprehensive model for assessing the quality of the design of the translator and the domain programming language.
3
Literature Review
The development of computer technology for use in a wide range of domains leads to a situation where user needs are becoming more demanding and complex. The quality of user interaction with this type of technology is becoming extremely important. Thus, the development of successful software systems is becoming increasingly complex. One of the best known metrics for estimating the complexity of an algorithm is cyclomatic complexity. This software metric can be usefully applied to both source and binary code. The authors implemented an algorithm for calculating path homology in an arbitrary dimension and applied it to several classes of corresponding flow graphs, including randomly generated flow graphs representing a structured and unstructured control flow [8]. The study also provides a comparison of path homology and cyclomatic complexity on a set of parsed binaries obtained from the grep utility. Path homology empirically generalizes the cyclomatic complexity for the elementary concept of structured code and seems to determine more structurally relevant features of the control flow as a whole. Thus, path homology can significantly improve cyclomatic complexity. Static code analysis tools can draw attention to code that is difficult for developers to understand. However, most conclusions are based on unverified indicators, which can lead to confusion and code that is difficult to understand without identifying. In his study, Marvin Munoz Baron examines a metric called Cognitive Complexity, which was explicitly developed to measure code intelligibility and is already widely used through its integration into well-known static code analysis tools [5]. The researchers calculated the correlation of these measurements with the corresponding metric values and statistically summarized the correlation coefficients using meta-analysis. Cognitive complexity is positively correlated with comprehension time and subjective assessments of comprehensibility. The metric showed ambiguous results in terms of correlation with the correctness of comprehension tasks and with physiological indicators. Another important subject of discovery is testability of a system. Understanding how well software components support self-testing is important for accurately planning test activities, training programmers, and planning effective refactoring activities. Software testing evaluates this property by associating code characteristics with test effort. The verifiability studies reported in the literature investigate the relationship between class performance and test effort, measured as the size and complexity of related test sets. They found that some indicators of the class have a moderate correlation with the indicators of test effort. In his work, Valerio Terragni proposes an approach to measuring testability that normalizes it in terms of test quality, which we have quantified in terms of code coverage and mutation evaluation [9]. The results confirm that the normalization of test efforts in terms of test quality significantly improves
Assesment Model for DSL Design
779
the relationship between class performance and test efforts. Better correlations will lead to better prediction power and, consequently, better prediction of test efforts. The relative LOC between components of a single project proves to be an extremely useful metric for automated testing. In their work, Josie Holmes uses a heuristic based on LOC calculations for proven functions to dramatically increase the efficiency of automated test generation [3]. This approach is especially valuable in languages where the collection of code coverage data for testing guidance has a very high overhead. The researchers applied the heuristic to propertybased Python testing using the TSTL (Template Scripting Testing Language) tool. Numerous experiments have shown that simple LOC heuristics can improve the coverage of branches and extracts by large margins (often more than 20%, up to 40% or more) and improve fault detection by an even larger margin (usually more than 75% and above) to 400% or more). The LOC heuristic is also easy to combine with other approaches, and it is comparable to, and perhaps more effective than, the two established approaches for guiding random testing. The use of code quality control platforms to analyze source code is increasingly attracting the attention of the developer community. These platforms are ready to analyze and verify the source code written in various general purpose programming languages. The emergence of domain languages allows professionals from different fields to develop and describe solutions to problems in their disciplines. Thus, methods and tools for analyzing the quality of source code can also be applied to software artifacts developed in a domain-dependent language. To assess the quality of the language code for a particular domain, each software component required by the quality platform to analyze and query the source code must be developed. This becomes a time-consuming and error-prone task, for which Ivan Ruiz-Rube describes a model-driven interaction strategy that prevents the gap between grammar formats of source code parsers and domainspecific text languages [8]. This approach has been tested on the most common platforms for designing text languages and analyzing source code. The software development required to build multi-agent systems (MAS) typically becomes complex and time-consuming due to the autonomy, distribution, and openness of these systems in addition to the complex nature of internal agent behavior and agent interaction. To facilitate the development of MAS, researchers propose various domain-specific modeling languages (DSML), enriching MAS metamodels with specific syntax and semantics. Although descriptions of these languages are given in relevant studies with examples of their use, unfortunately, many are not evaluated either in terms of ease of use (they are difficult to study, understand and use) or the quality of the artifacts. Thus, in his work, Omer Faruk Alaca presents an evaluation system called AgentDSM-Eval and its auxiliary tool that can be used to systematically evaluate MAS DSML according to various quantitative and qualitative aspects of agent software development [1]. The empirical evaluation provided by AgentDSM-Eval has been successfully applied to one of the well-known MAS DSMLs. The evaluation showed that both the coverage of MAS DSML domains and the acceptance of simulation elements
780
O. Ocheretianyi and I. Baklan
by agent developers can be determined using this system. Moreover, quantitative results of the tool can evaluate the effectiveness of MAS DSML in terms of development time and bandwidth.
4 4.1
Assessment Criteria for Domain Specific Languages Domain Oriented Comparison
The main difference between the metrics for estimating GPL and DSL is the need to restrict the domain in which the programming languages are compared. Comparison of java and c sharp can be performed using the same metrics as languages share basic principles and imitate C-like syntax. We can even compare java and lisp despite the complete differences in the mechanisms of the language, focusing on the uniformity of the text notation used to create the program. However, we must be careful when comparing the domain language that describes insurance and medicine. Since in some languages the terms describing the language may intersect, however, the source code that we get will be completely different, despite the similarity of the description of the problem. The second important factor for comparing DSL is the type of notation used in the language in which the program code is translated. Considering the trend towards the use of visual notation in areas such as media, circuitry and simulation, we need to make adjustments to how metrics evaluate the generated code in the programming language. It is quite possible to increase the code for the correct construction of the program using graphical notation due to the need to specify the coordinates of the elements to be combined. However, this feature can be overcome by using text notation used as an analogue for such languages, if available. The third important factor to consider when comparing domain languages is the language into which the program code is translated. When developing a program using c sharp or c++, we usually do not pay attention to the byte code that will be generated as a result of the compiler. However, the means of implementing a domain programming language can be completely different. For that purpose different programming languages could be used, that solve problems from one domain, but differ in notation and functionality. To compare such examples, it is necessary to use additional coefficients that will affect the metrics related to the conciseness of the generated code, as the same problem in different programming languages can be solved by different amounts of program code. Figure 1 shows the subsets of domain specific languages with limitation for single comparison. 4.2
Classic Size Metrics
For programming languages, the main identity is its grammar, which provides the basis for deriving elements of this language. Grammar, in turn, can be seen as both a specification and a program. Conceptually, we can consider any program
Assesment Model for DSL Design
781
Fig. 1. Subsets of restrictions for domain oriented language comparison
as consisting of a set of procedures, where each procedure is defined by some body of the procedure, built using basic language primitives. The procedures correspond to non-terminals, and the bodies of procedures are the right part of the production rules. Conditional primitives are operations of combination and combine context-free contexts that correspond to alternation and sequence, respectively. This mapping can be extended directly to the closure operators and options used in EBNF. Consider domain specific grammar (N,T,S,P), and a production rule (n → a) € P, we refer to a as the right hand side (RHS) of the rule. The RHS is assembled from the use of the grammatical operators to the terminals and nonterminals from T and N. We denote this application in general as fk(x) where k € -,—,?,*,+,e, representing the operations of concatenation, union, optionality, closure, positive closure and the empty string. The arguments, referenced by x, must correlate in quantity to the arity of the operator. We specify that e has no arguments, optionality has one argument, and all the other operators have exactly two arguments. One of the simplest, dimensional metrics that can be applied to a program to measure its size is to count the number of operations that appear in that program [2]. The metric of equivalent size for context-free grammars is the number of non-terminal words in that grammar. N umber of non − terminals (N OT ERM ) = N A complement metric is the quantity of terminal symbols: N umber of terminals (T ERM ) = T Despite the simplicity of these size estimates, they can still provide useful information about grammar. More non-terminals involve more grammar support costs, as changes in the definition of one can affect large amount of connected declarations. For grammar engineers the size of the parsing table is commonly proportional to the number of terminals and non-terminals.
782
O. Ocheretianyi and I. Baklan
The McCabe metric measures the number of linearly independent paths in the execution of a program [7]. This metric is defined as a measure of the number of components on the flow graph, where the components are defined by the usage of Boolean expressions in conditional and iterative operators. It is an appropriate index of the size of complexity correlated with verifying this procedure, as a good set of tests will tend to explore more available paths through the flow graph. Context-free grammar solutions are defined by join and conditional statement operators and an iteration close statement. Hence, idea of McCabe’s complexity of grammars is to count the total number of options in that grammar defined by the entries of these operators. M cCabe Complexity (M CC) = op , where op ∈ {|, ?, ∗, +} Two grammars with the similar amount of non-terminals can still be quite more complicated if one grammar has much more options for its non-terminals than the other one. The sum of McCabe’s level of complexity for these options will highlight this variation. The aim of the parsing algorithm is to implement a way to favor between options in grammar during derivation. In such a way, the high complexity of McCabe demonstrate greater probability for collision in a search-based parser and greater potential for backtracking in a search-based parser. In a procedure, a size metric is the amount of nodes in the related flow graph, and it is applied as an main substitute to the metric of common lines of code (loc). Considering the production rules comply to the procedures, the nodes on the flow graph correspond to the terminals or non-terminals on the RHS production rules. To count the average size of RHS, we compute the total number of RHS for each rule and divide by the number of non-terminals. #rules #T RHS + #N RHS n=1 groups Average RHS Size (AV S) = #rules In this equation TRHS is all terminals and NRHS is all non-terminals used in right hand side of concrete production rule The average size of RHS is a metric of the amount of characters we can predict on average in the right part of a grammar rule. In some parsers, bigger RHSs may imply that more characters or their linked features must be settled on the parse stack, and this can decrease the performance of parser. Generally, it is viable to cut down the length of the RHS by substituting the part with a new non-terminal one, and then the average RHS size metric should be treated in relation to the entire amount of non-terminals. 4.3
Evaluating Conciseness of a Domain Specific Language
An extremely important part of the process of interpreting a problem-oriented programming language is the generation of code in the language into which the domain instructions are translated. The transition between levels of generations
Assesment Model for DSL Design
783
of programming languages is characterized by an increase in the amount of program code at each subsequent level. However, this feature is completely leveled with the growth of the code base in the higher order language. The increase in code is usually due to the need to introduce additional structures to improve the organization of the program or increase the level of abstractions used in the programming process. codegen =
#T #generated tokens
These characteristics can be measured by calculating the ratio of the number of tokens needed to describe the program that solves a specific problem written in domain-oriented language to the number of tokens generated as a result of the translator. Such a metric cannot be calculated only within a specific program, so to calculate the total value of the language, it is necessary to find the average between the estimates measured on different volumes of program code. This additional step will reduce the impact of additional structural code that will predominate in small examples by balancing the inverse that will be obtained on large files. The second indicator, related to the interaction with the language into which the domain language code is translated, is the level of expansion of the domain language in relation to the language into which the code is translated. codegen =
#DSLRules − #SourceRules #SourceRules
This indicator should be calculated as the ratio of the difference between the total number of rules available in the domain language and the number of rules corresponding to the language in which the code is translated to the total number of rules available in the domain language. The metric described above is able to explain the level of domain coverage through language design and qualitatively separate those languages that banally disguise their predecessors. 4.4
Evaluating Programming Language as a Software
The last group of metrics that we offer concerns the assessment of the quality of programming language design in terms of language support capabilities in the future. Existing language workbenches do not provide any opportunities to organize testing of the developed parser and lexer. However, in future systems that will provide language development as software, metrics for estimating the coverage of functional parts of the interpreter will be required. Commonly used are test coverage, the ratio of the number of test units to the modules available in the system and the ratio of passed and failed tests in the existing assembly of the translator. This metric was developed as part of the evaluation of the intelligent system. The development of problem-oriented language using this system involves the presence of tests at all stages of transitions between entities. The first stage of
784
O. Ocheretianyi and I. Baklan
testing should be to check the quality of data transfer from the ontology on the basis of which the language is built into the grammar that represents its notation. Describing the domain area using ontologies allows you to formalize the process of collecting data about the domain and uses common notations such as OWL RDF XML [4]. The transition from OWL to ANTLR grammar is fully automated, but some OWL2 properties may be incorrectly processed in the property chain process. The second stage of building a translator according to the new method involves the transformation of ANTLR grammar into concepts of paradigms of programming languages. In the process of this transition, there may be translation errors inherent in the person when creating the rules of the transition. The third stage involves the transformation of paradigm concepts into a programming language that creates an executable file. The main mistakes at this stage will be the inefficiency of transferring the concepts of paradigms that are absent in a particular implementation language. To fix this problem, developers will need to reproduce the logic of components that are not provided in the language. The currently available language workbenches do not contain such a complex component system, but they can use such metrics to verify the execution of code described in the domain language with the documentation that describes it. The second indicator will note the presence of the described tests of the functionality which has not yet been developed in the existing assembly of the translator of domain-oriented language. This group of metrics should also include the number of covered domain language requirements. This indicator can be used to assess the transition from the domain description to the grammar of the programming language and translator. This method of evaluating the translator is available to show the relevance of the programming language and its software in the process of solving domain problems. An additional problem for creating domain programming languages is the lack of notations to describe the requirements for such a language. The language actually needs to build the same requirements as the software product: functional requirements, non-functional requirements, testing and implementation requirements. This area is currently not fully formalized and needs further research. The presence of formalized requirements would greatly simplify the verification of the quality of language design to the needs of domain experts. A possible way to solve this problem may be to use BPMN notation to describe the interaction of the components of the domain area with each other. This description will simplify the wording of rules that are obvious to domain specialists and completely unknown to grammar-ware specialists.
5 5.1
Experiment, Results and Discussion Experimental Data Sets
Data set was collected from GitHub repository of ANTLR parser generator [6]. For the current research were selected three examples of domain specific languages that are used in web development. Main technology that we used was third edition of cascading style sheets. For the comparison were chosen two
Assesment Model for DSL Design
785
alternatives for this language – framework languages LESS and SCSS. In this experiment we have used only parser grammar files because they contain all necessary rules. Files for this languages are quite similar and are formed as a rules connected to certain selectors. Main difference between this alternatives consists of functionality that LESS and SCSS expand by mixins, variables and other computation possibilities. Figure 2 shows the examples of code with similar functionality for chosen languages.
Fig. 2. Comparison of same code samples written in CSS, LESS and SCSS.
5.2
Simulation Results and Discussion
During the experiment for grammar sizes we compared ANTLR parser grammar parts of LESS and SCSS. Following results show that selected languages are quite similar in size of instructions but in all other metric they show significant difference (Table 1). Table 1. Grammar size metrics for SCSS and LESS Language Rules Noterm Term Mcc Avs SCSS
55
191
93
122
3.359
LESS
37
116
68
76
3.718
Sequentially we can say that SCSS has 18 more rules, 75 more non-terminals, 25 more terminals and exceed Mccabe complexity by almost two times. This metric show not only that SCSS has larger grammar but also define larger functional coverage for selected domain. Additionally we can say that SCSS was used in 2 times more repositories than LESS and now occupy first position of popularity for styling web pages for 5 consequent years. For the next experiment on code
786
O. Ocheretianyi and I. Baklan
generation we collected data set of 10 similar code samples for LESS and SCSS. This examples solved the classic problem of stylizing the elements of the admin panels of websites. Examples were selected so that the code which was received as a result of translation of both examples in CSS gave identical result. The results of the metric for evaluating the generated code show that after crossing the border of 100 tokens, the efficiency of using languages for LESS and SCSS preprocessors begins to increase. Figure 3 shows that in files containing more than 60 tokens the tendency in smaller quantity of SCSS tokens necessary for performance of the same task in comparison with LESS is unambiguously defined.
Fig. 3. Comparison of number of terminals in similar code samples written in CSS, LESS and SCSS.
These results are just an example of the use of metrics and require a much larger volume of comparison files to fully compare language quality in production usage. Results on Table 2 of average code generation show that less needs more code than scss to produce similar css.
Assesment Model for DSL Design
787
Table 2. Number of terminals in code sample in data set for comparison LESS and SCSS
6
CSS tokens
Code-gen less Code-gen scss
17
1.82
1.71
21
0.90
0.62
34
1.24
1.44
37
0.92
0.78
67
0.97
0.78
74
0.92
0.85
87
0.90
0.86
172
0.76
0.72
176
0.72
0.70
261
0.71
0.68
Average code-gen 0.99
0.91
Conclusions
This article offers several new indicators for assessing the design quality of domain-oriented programming languages. It presents experimental results for classical metrics and for proposed new metrics on several grammars. These grammars cover popular domain languages in the field of web programming. Existing metrics are calculated based on grammar, program code of the translator and compliance with the stated requirements. To perform the comparison, the limits of restrictions for grammars that can be compared were defined. The first group of indicators reflects the quality of transmission of domain instructions in the programming language of the translator. The second group of indicators reflects the availability of testing tools for the created grammar and compliance with the requirements for the domain language. We believe that these indicators provide important results that add new dimensions to the evaluation of domain-oriented programming languages. From this point of view, domain-based metrics are more suitable for translator experts, while other metrics may also be applicable to programmers who will support and develop domain-oriented language without specialized knowledge in language engineering. From experimental results, we see that some indicators are directly related to the size or complexity of the grammar, while others remain stable even if the size or complexity of the grammar varies. Our findings show that the indicators in both cases meet the conditions for assessing the quality of grammar. However, we believe that the quality of grammar cannot be taken into account by a single metric, but by a number of metrics investigated in this paper. Moreover, this quality is not an absolute quantity, but is relative to other grammars. In this paper, we investigate only the metrics of the grammatical part of the analyzer, despite the lexical part of the analyzer. However, the complexity of these two parts is closely linked.
788
O. Ocheretianyi and I. Baklan
Also, the performance of one of the metrics very much depends on the collected data set of the program code, so with the increase of data sets in the future we will be able to get better performance. The further perspectives of the authors’ research are the implementation of the obtained results for the development of intelligent system for building problem-oriented languages based on multi-paradigm approach.
References 1. Alaca, O.F., Tezel, B.T., Challenger, M., Goul˜ ao, M., Amaral, V., Kardas, G.: Agentdsm-eval: a framework for the evaluation of domain-specific modeling languages for multi-agent systems. Comput. Stand. Interf. 76, 103513 (2021). https:// doi.org/10.1016/j.csi.2021.103513 2. Fernau, H., Kuppusamy, L., Oladele, R.O., Raman, I.: Improved descriptional complexity results on generalized forbidding grammars. Disc. Appl. Math. (2021). https://doi.org/10.1016/j.dam.2020.12.027 3. Holmes, J., Ahmed, I., Brindescu, C., Gopinath, R., Zhang, H., Groce, A.: Using relative lines of code to guide automated test generation for python. ACM Trans. Softw. Eng. Methodol. (TOSEM) 29(4), 1–38 (2020). https://doi.org/10.1145/3408896 4. Munir, K., Sheraz Anjum, M.: The use of ontologies for effective knowledge modelling and information retrieval. Appl. Comput. Inf. 14(2), 116–126 (2018) 5. Mu˜ noz Bar´ on, M., Wyrich, M., Wagner, S.: An empirical validation of cognitive complexity as a measure of source code understandability, pp. 1–12 (2020). https:// doi.org/10.1145/3382494.3410636 6. Parr, T., Fisher, K.: Ll(*): The foundation of the antlr parser generator. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 425–436 (2011) 7. Peitek, N., Apel, S., Parnin, C., Brechmann, A., Siegmund, J.: Program comprehension and code complexity metrics: an fmri study, pp. 524–536 (2021). https:// doi.org/10.1109/ICSE43902.2021.00056 8. Ruiz-Rube, I., Person, T., Dodero, J.M., Mota, J.M., S´ anchez-Jara, J.M.: Applying static code analysis for domain-specific languages. Softw. Syst. Model. 19(1), 95–110 (2019). https://doi.org/10.1007/s10270-019-00729-w 9. Terragni, V., Salza, P., Pezz`e, M.: Measuring software testability modulo test quality, pp. 241–251 (2020). https://doi.org/10.1145/3387904.3389273
General Scheme of Modeling of Longitudinal Oscillations in Horizontal Rods Roman Tatsij , Oksana Karabyn(B) , Oksana Chmyr , Igor Malets , and Olga Smotr Lviv State University of Life Safety, Lviv, Ukraine
Abstract. In this paper, we present the results of modeling nonstationary oscillatory processes in rods consisting of an arbitrary number of pieces. When modeling oscillatory processes that occur in many technical objects (automotive shafts, rods) an important role is played by finding the amplitude and frequency of oscillations. Solving oscillatory problems is associated with various difficulties. Such difficulties are a consequence of the application of methods of operation calculus and methods of approximate calculations. The method of modeling of oscillatory processes offered in work is executed without application of operational methods and methods of approximate calculations. The method of oscillation process modeling proposed in this paper is a universal method. The work is based on the concept of quasi-derivatives. Applying the concept of quasi-derivatives helps to avoid the problem of multiplication of generalized functions. Analytical formulas for describing oscillatory processes in rods consisting of an arbitrary number of pieces are obtained. It can be applied in cases where pieces of rods consist of different materials, and also when in places of joints the masses are concentrated. The proposed method allows the use of computational software. An example of constructing eigenvalues and eigenfunctions for a rod consisting of two pieces is given. Keywords: Kvazidifferential equation · The boundary value problem · The cauchy matrix · The eigenvalues problem · The method of fourier and the method of eigenfunctions
1
Introduction
The problem of finding eigenvalues and eigenfunctions for equations in partial derivatives of the second order is an urgent problem. The relevance is due to the fact that such problems arise in the modeling of oscillatory processes of many technical systems. Each specific model is a separate mathematical problem, the ability to solve which depends on the input conditions. The theory of oscillatory processes is described in detail in [2,4,15,17]. In these works the mathematical and physical bases of oscillatory processes and methods of their modeling are stated. The method of solving boundary value problems according to which the solution of c The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 789–802, 2022. https://doi.org/10.1007/978-3-030-82014-5_54
790
R. Tatsij et al.
problems is reduced to solving simpler problems is called the method of reduction. The solution of the general boundary value problem is sought in the form of the sum of functions, one of which is the solution of a homogeneous problem, and the other is an inhomogeneous problem with zero boundary conditions. One of the most common methods for solving inhomogeneous problems is the method of separating variables, or the Fourier method. According to this method, the solution of a homogeneous problem is sought in the form of the product of two functions of one variable. Such a problem is called a problem of eigenvalues and eigenfunctions. It is also called the task of Shturm Liouville [1,9,11]. The Fourier method is an accurate method of solving these problems. In the process of solving problems by this method there are problems with the justification of the convergence of series and the multiplication of generalized functions [6,10,19]. In some cases, these problems can be avoided by reducing the system to a matrix form by introducing the so-called quasi-derivative [4,7,13]. In this paper, we used this method to find solutions to four problems of oscillatory processes and demonstrated the possibility of finding the required number of eigenvalues and eigenfunctions. This method is new. It allows you to avoid multiplication of generalized functions. The proposed method has an advantage over other methods in that it allows the use of computational mathematical packages.
2
Problem Statement
Oscillation processes are modeled using hyperbolic type equations. Quite often, it is almost impossible to obtain closed-loop solutions of such differential equations. The method proposed in our work belongs to the direct methods of solving boundary value problems, as a result of which the solutions are obtained in a closed form. A feature of our work is the use of a quasi-derivative. This approach will make it possible to write equations with partial derivatives of the second order of the hyperbolic type with general boundary conditions to the matrix form. The process of constructing the solution is based on the multiplication of matrices. Consider a second-order differential equation in partial derivatives m(x)
∂u ∂ ∂2u (λ(x) ) = 2 ∂t ∂x ∂x
(1)
where x ∈ (x0 ; xn ), t ∈ (0; +∞) [1] and let’s call it quasi-derivative. Write Denote the product λ(x) ∂u ∂x by u down the general boundary conditions: p11 u(x0 ; t) + p12 u[1] (x0 ; t) + q11 u(xn ; t) + q12 u[1] (xn ; t) = ψ0 (t) (2) p21 u(x0 ; t) + p22 u[1] (x0 ; t) + q21 u(xn ; t) + q22 u[1] (xn ; t) = ψn (t) The initial conditions are the following: u(x; 0) = Φ0 (x) ∂u ∂t (x; 0) = Φ1 (x)
(3)
where ψ0 (t), ψn (t) ∈ C 2 (0; +∞), Φ0 (x), Φ1 (x) are piecewise continuous on [x0 ; xn ]
General Scheme of Modeling of Longitudinal Oscillations
3
791
Literature Review
As defined in the introduction, the problem of finding eigenvalues and eigenfunctions of a level in private derivatives of other hyperbolic conditions is an urgent problem in modeling quantitative processes of different technical systems. In the article [8] the case of rod oscillations under the action of periodic force is considered. What is special is that the action of force is distributed along the rod at a certain speed. Finding eigenvalues and eigenfunctions is a key task for solving the equation with second-order partial derivatives. The work [7] is devoted to finding a class of self-adjoint regular eigenfunctions and eigenvalues for each natural n. The boundary value problem problem for the hyperbolic equation in the rectangular domain is considered in [12]. A feature of this work is the case of singular coefficients of the equation. A very important aspect is the proof of the existence and uniqueness of the solution (Cauchy problem) and the proof of the stability theorem of the solution. The solution is obtained in the form of a Fourier-Bessel series. In the article [16] the matrix approach to the decision of a problem on eigenvalues and eigenfunctions (Sturm-Liouville problem) of the equation of hyperbolic type is offered. Emphasis is placed on the fact that the type of solution depends on the structure of the matrices. The matrix approach to the problem presentation is very convenient for the use of computational software. In our case the method of matrix calculus and the presentation of the differential equation in partial derivatives and the most general boundary conditions in matrix form are also used. In [3] a two-step method of discretization of a combined hyperbolic-parabolic problem with a nonlocal boundary condition was proposed. Examples of solving such problems by numerical methods are given. The problem of stability of solutions of the second-order problem considered in the work [18]. The method of introducing a quasi-derivative and reducing the equation of thermal conductivity to the matrix form is used in the work [13]. However, despite the achievements in this subject area, the problem of series convergence, multiplication of generalized functions and obtaining solutions of equations in partial derivatives remains open. The solution to this problem can be achieved to some extent through the use of modern calculation technologies using software tools [5]. The authors proposed to use the matrix form of the second-order differential equation and modern methods of calculations using mathematical software to model the oscillatory processes in the works [14]. The objective of the research is to find solutions of the equation of rod oscillations with different load and conditions using the concept of a quasi-derivative.
4
The General Scheme of Search of the Solution
The general scheme of finding a solution is to build two functions w(x, t) and v(x, t) such as u(x, t) = w(x, t) + v(x, t)
792
R. Tatsij et al.
Let’s find the function w(x, t) by a constructive method, then the function v(x, t) will be defined unambiguously. The function w(x, t) is a solution of a boundary value problem (4) (λ(x)wx )x = −f (x) with the boundary conditions (2). We reduce this problem to a matrix form
Where
Wx = A(x) · W + F
(5)
P · W (x0 , t) + Q · W (xn , t) = Γ (t)
(6)
1 0 λ(x) w 0 F = , ,W = , −f (x) w[1] 0 0 p11 p12 ψ0 (t) q11 q12 P = ,Q = , Γ (t) = p21 p22 q21 q22 ψ1 (t) A(x) =
Let x0 < x1 < x2 0,
i=0
f (x) =
n−1 i=0
gi (x)θi +
n−1
sj δj (x − xj ), gi ∈ C[xi , xi+1 ), sj ∈ R,
j=0
P =
10 00 ,Q = . 00 10
By using the described method, we can avoid multiplication of generalized functions that are embedded in the function f (x). Let’s denote σn =
n−1
bm (xm+1 , xm ), Ik−1 (xk ) = −
m=0 [1] Ik−1 (xk )
=−
xk
xk−1
bk−1 (xk , s)gk−1 (s)ds,
xk
xk−1
gk−1 (s)ds.
General Scheme of Modeling of Longitudinal Oscillations
795
With such designations wi (x, t) = ψ0 (t) + (bi (x, xi ) + σi ) · −
ψn (t) − ψ0 (t) σn
n n−1 1 [1] (bi (x, xi ) + σi ) · (Ik−1 (xk ) + (Ik−1 (xk ) − sk ) · bm (xm+1 , xm ) σn k=0
+
i k=0
m=k
[1]
(Ik−1 (xk ) + (Ik−1 (xk ) − sk ) +bi (x, xi )
i k=0
i−1
bm (xm+1 , xm ))
m=k
[1])
(Ik−1 (xk ) − sk )) + Ii (x)
0 . The characteristic equation of function v(x, t) takes Vector C in (14) is 1 the form b12 (ω) = 0 and function v(x, t) takes the form (17).
6
A Case of Piecewise Constant Functions with Concentrated Masses
Let in (1) m(x), f (x) – are piecewise constant with concentrated masses, λ(x) – is piecewise constant: m(x) =
n−1
mi θi (x) +
i=0
f (x) =
n−1
Mi δ(x − xi ),
i=0
n−1
gi θi (x) +
i=0
n−1
Si δ(x − xi ),
i=0
and λ(x) – are piecewise constant functions λ(x) =
n−1 i=0
λi θi (x). In boundary
conditions (2) P =
10 00 ,Q = . 00 10
In this case it is possible to specify the function wi (x, t)
i−1 xm+1 x − xi ψn (t) − ψ0 (t) wi (x, t) = ψ0 (t) + + · n−1 xm+1 λi λm m=0 − n−1 m=0
1 xm+1 λm
·
m=0 i−1 xm+1 x − xi + λi λm m=0
·
λm
796
R. Tatsij et al.
·
n gk−1 x2
k
+
i
k=1
−
− (gk−1 xk + sk )
2λk−1
k=1
n−1 m=k
xm−1 λm
i−1 xm−1 gk−1 x2k − (gk−1 xk + sk ) 2λk−1 λm
m=k
x − xi λi
i
(gk−1 xk + sk ) +
k=1
gi (x − xi )2 2λi
To build a function v(x, t) we use the method of expansion by the eigenfunctions. Due to the delta functions in the left and right part of the equation, we go to the system n−1
n−1 X = Ak θk + Ck δ(x − xk ) · X, k=0
k=0
1 0 0 0 λ k where Ak = , Ck = with boundary conditions −M ω 2 0 −mk ωk 0 P X(x0 ) + QX(xn ) = 0. Cauchy matrix has the following structure x0 , ω)def B(x, =
n−1
i (x, xi , ω) · B(x i , x0 , ω) · θi , B
i=0 i j · B i , x0 , ω)def i−j (xi−j+1 , xi−j , ω), C i = (E + Ci ), i = 1, n − 1, C where B(x = j=0
i , xi , ω)def B(x = E, Bi (x, s, ω) =
sin αi (x−s) cos αi (x − s) i λi αi , αi = ω m λi . −λi αi sin α (x − s) cos α (x − s) i i 0 Vector C in (14) is . 1 The characteristic equation of function v(x, t) takes the form b12 (ω) = 0. The function v(x, t) is the same as in (17).
7
A Case of Piecewise Constant Functions
Let in (1) m(x), f (x), λ(x) – are piecewise constant functions: m(x) =
n−1 i=0
mi θi (x), f (x) =
n−1 i=0
fi θi (x), λ(x) =
n−1
λi θi (x).
i=0
11 00 ,Q= . 00 11 Under these conditions, the Cauchy matrix components bi (x, s) = x−s λi . Let’s consider boundary conditions (6) with matrix P =
General Scheme of Modeling of Longitudinal Oscillations
Let’s denote σn = q12
n−1
797
bm (xm+1 , xm ) + q22 ,
m=0
xk Ik−1 (xk ) = −
bk−1 (xk , s)fk−1 ds,
xk−1 [1] Ik−1 (xk )
xk =−
fk−1 ds.
xk−1
The function v(x, t) is the following v(x, t) =
+
i−1
1 · (ψ0 (t)σn − p12 ψn (t) − ψ0 (t)q21 · (bi (x, xi ) p11 σn − q21 p12 n
bm (xm+1 , xm ))) +
m=0
p12 · (ψn (t) − q21 ( Ik−1 (xk ) p11 σn − q21 p12 k=0
[1]
+Ik−1 (xk ) ·
n−1
bm (xm+1 , xm )) − q22
m=k
(1 + bi (x, xi ) +
n k=0
i−1
[k]
Ik−1 (xk ))·
bm (xm+1 , xm ))
m=0
In this case the characteristic equation of the problem to eigenvalues is b12 (ω) + b22 (ω) − b11 (ω) − b21 (ω) = 0. −1 . The function v(x, t) is the same as in (17). Vector C in (14) is 1 By specifying the number of partition segments, material parameters, and core dimensions, we can obtain an analytical expression of the required number of eigenvalues and eigenfunctions. We do all the calculations in Maple package.
8
An Example of a Numerical Implementation of the Method for a Rod of Two Pieces
Modern software allows you to get the required number of eigenvalues and eigenfunctions, which ensures the appropriate accuracy of the solution. Consider the result of using the Maple package to obtain a solution to the problem. For example, consider a steel rod 1 m long, consisting of two cylindrical pieces of equal length cross-sectional area, respectively, are F0 = 0, 0025 πm2 , F1 = 0, 000625 πm2 , x0 = 0, x1 = 0, 5, x2 = 1. The Young’s modulus for steel is
798
R. Tatsij et al.
E = 20394324259 kg/m2 , density is ρ = 7900 kg/m3 . Consider the equation of longitudinal oscillations of the rod ∂2u ∂ ρ ∂u · F (x) 2 = F (x) E ∂t ∂x ∂x with boundary conditions
and initial conditions
u(x0 , t) = 1, u(x2 , t) = 1 u(x, 0) = ϕ0 (x), ∂u ∂t (x, 0) = ϕ1 (x).
Calculations are performed in the Maple package (Figs. 1, 2, 3, 4 and 5).
Fig. 1. Finding the first eleven eigenvalues
Fig. 2. Finding the first eleven eigenfunctions Xk,0
General Scheme of Modeling of Longitudinal Oscillations
Fig. 3. Finding the first eleven eigenfunctions Xk,1
Fig. 4. Finding the first eleven eigenfunctions Xk,0
799
800
R. Tatsij et al.
Fig. 5. Finding the first eleven eigenfunctions Xk,1
9
Conclusions
In the work on the basis of the created new method of solving nonstationary problems for equations of hyperbolic type, the solution of the actual scientific and technical problem by methods of mathematical modeling of wave processes is given. The method of solving equations in partial derivatives of the second order of the hyperbolic type described in the work makes it possible to model oscillating processes in horizontal rods consisting of an arbitrary number of pieces and having different cross sections. The proposed direct method can be used in the study of oscillatory processes without the use of approximate and operational calculus methods. The solutions of the hyperbolic equation with piecewise continuous coefficients on the spatial variable and right-hand sides with the most general local boundary conditions are obtained. The partial case, piecewise-constant coefficients and right-hand sides are singled out, when the solutions of the initial problem can be obtained in a closed form. Using the reduction method, the solutions
General Scheme of Modeling of Longitudinal Oscillations
801
of the general first boundary value problem for the equation of hyperbolic type with piecewise continuous coefficients and stationary inhomogeneity are obtained. The general first boundary value problem for the equation of hyperbolic type with piecewise - constant coefficients and - features is investigated. By specifying the number of partition segments, material parameters, and core dimensions, we can obtain an analytical expression of the required number of eigenvalues and eigenfunctions. We do all the calculations in Maple package. The possibilities of application of the proposed method are much wider than this work and, in particular, can be used in further research.
References 1. Al-Khaled, K., Hazaimeh, H.: Comparison methods for solving non-linear sturmliouville eigenvalues problem. Symmetry 1179(12), 1–17 (2020). https://doi.org/ 10.3390/sym12071179 2. Arsenin, V.Y.: Methods of Mathematical Physics and Special Functions, p. 432. Nauka, Moscow (1984) 3. Ashyralyev, A., Aggez, N.: Nonlocal boundary value hyperbolic problems involving integral conditions. Bound. Value Prob. (1), 1–10 (2014). https://doi.org/10.1186/ s13661-014-0205-4 4. Atkinson, F.: Discrete and Continuous Boundary Value Problems, p. 518. Academic Press, Cambridge (1964) 5. Borwein, J.M., Skerritt, M.P.: An Introduction to Modern Mathematical Computing: With Maple, p. 233. Springer, New York (2011). https://doi.org/10.1007/9781-4614-0122-3 6. Hornikx, M.: The extended fourier pseudospectral time-domain method for atmospheric sound propagation. J. Acoust. Soc. Am. 1632(4), 1–20 (2010). https://doi. org/10.1121/1.3474234 7. Kong, O., Wu, H., Zettl, A.: Sturm-liouville problems with finite spectrum. J. Math. Anal. Appl. 263, 748–762 (2001). https://doi.org/10.1006/jmaa.2001.7661 8. Lysenko, A., Yurkov, N., Trusov, V., Zhashkova, T., Lavendels, J.: Sum-of-squares based cluster validity index and significance analysis. Lect. Notes Comput. Sci. (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 5495, 313–322 (2009). https://doi.org/10.1007/978-3-642-049217 32 9. Martin, N., Nilsson, P.: The moving-eigenvalue method: hitting time for ito processes and moving boundaries. J. Phys. A Math. Theoretica 53(40), 1–32 (2020). https://doi.org/10.1088/1751-8121/ab9c59 10. Mennicken, R., M¨ oller, M.: Non-Self-Adjoint Boundary Eigenvalue Problems, p. 518. North Holland (2003) 11. Mukhtarov, O., Y¨ ucel, M.: A study of the eigenfunctions of the singular sturmliouville problem using the analytical method and the decomposition technique. Mathematics 415(8), 1–14 (2020). https://doi.org/10.3390/math8030415 12. Sabitov, K.B., Zaitseva, N.V.: Initial-boundary value problem for hyperbolic equation with singular coefficient and integral condition of second kind. Lobachevskii J. Math. 39(9), 1419–1427 (2018). https://doi.org/10.1134/S1995080218090299 13. Tatsii, R.M., Pazen, O.Y.: Direct (classical) method of calculation of the temperature field in a hollow multilayer cylinder. J. Eng. Phys. Thermophys. 91(6), 1373–1384 (2018). https://doi.org/10.1007/s10891-018-1871-3
802
R. Tatsij et al.
14. Tatsij, R.M., Chmyr, O.Y., Karabyn, O.O.: The total first boundary value problem for equation of hyperbolic type with piesewise constant coefficients and deltasingularities. Res. Math. Mech. 24, 86–102 (2019) 15. Tichonov, A., Samarskii, A.: Equations of Mathematical Physics, chap. 2: Equations of the Hyperbolic Type, p. 777. Pergamon Press, Oxford (1990) 16. Tisseur, F., Meerbergen, K.: The quadratic eigenvalue problem. SIAM Rev. 42(2), 235–286 (2001). https://doi.org/10.1137/S0036144500381988 17. Wyld, H.W., Powell, G.: Mathematical Methods for pPhysics, Chap. 1: Homogeneous Boundary Value Problems and Special Functions, p. 476. CRC Press, Boca Raton (2020) 18. Yang, F., Zhang, Y., Liu, X., Li, X.: The quasi-boundary value method for identifying the initial value of the space-time fractional diffusion equation. Acta Mathematica Scientia 40(3), 641–658 (2020). https://doi.org/10.1007/s10473-0200304-5 19. Yarka, U., Fedushko, S., Vesel´ y, P.: The dirichlet problem for the perturbed elliptic equation. Mathematics 8, 1–13 (2020). https://doi.org/10.3390/math8122108
Author Index
A Adler, Oksana, 762 Anatolii, Nikolenko, 744 Andreiev, Artem, 548 Andrusiak, Iryna, 188 Azarova, Anzhelika, 534 B Babichev, Sergii, 69, 449 Bahrii, Ruslan, 491 Baklan, Ighor, 776 Baklan, Igor, 634 Balanda, Anatolii, 413 Bardachov, Yuriy, 297 Barmak, Olexander, 491 Ben, Andrii, 266 Bidyuk, Peter, 3, 107, 164 Bilonoh, Bohdan, 624 Bodyanskiy, Yevgeniy, 613, 624 Boiarskyi, Oleksii, 377 Boskin, Oleg, 39 Boyko, Nataliya, 188 Brodyak, Oksana, 423 Burennikova, Nataliia, 282 Burov, Yevhen, 423 C Chmyr, Oksana, 789 D Datsok, Oleh, 148 Demchenko, Violetta, 39 Dereko, Vladyslav, 602 Durnyak, Bohdan, 69
F Fefelov, Andrey, 314 Fefelova, Iryna, 314 G Gall, Luke, 128 Gnatyuk, Sergiy, 39 Golovko, Nataliya, 367 Gonchar, Olga, 519 Goncharenko, Tatiana, 367 Gorokhovatskyi, Oleksii, 438 Gozhyj, Aleksandr, 3, 164 H Havrysh, Kostiantyn, 282 Hryhorenko, Oleksandr, 413 Hrytsiuk, Ivanna, 589 Hrytsyk, Volodymyr, 573 I Ivashchenko, Taras, 297 Ivashina, Yuri, 367 Izonin, Ivan, 685 K Kalinina, Irina, 3, 164 Karabyn, Oksana, 789 Kavetskiy, Vyacheslav, 282 Kharchenko, Vyacheslav, 128 Kiriak, Anastasiia, 464 Kirichenko, Lyudmyla, 397 Kiryushatova, Tetiana, 230 Klymova, Iryna, 613 Kmetyk-Podubinska, Khrystyna, 188
© Springer Nature Switzerland AG 2022 S. Babichev and V. Lytvynenko (Eds.): ISDMCI 2021, LNDECT 77, pp. 803–805, 2022. https://doi.org/10.1007/978-3-030-82014-5
804 Kobets, Vitaliy, 252, 449 Kolchygin, Bohdan, 624 Kolomiiets, Anna, 645 Kondratieva, Inna, 349 Korniychuk, Vitaliy, 729 Korobchynskyi, Maksym, 54 Kovalchuk, Olena, 714 Kovtoniuk, Denys, 602 Kozel, Viktor, 297 Kozlova, Anna, 548 Krak, Iurii, 491, 507, 714 Krasnikov, Kyrylo, 25 Kravets, Petro, 423 Kravtsova, Lyudmyla, 94 Krysiak, Pavlo, 54 Kudriashova, Alona, 217 Kutsenko, Mykyta, 563 Kuzmin, Vladyslav, 762 Kuzmina, Elena, 762 Kuznietsova, Maryna, 107 Kuznietsova, Nataliia, 107 Kynash, Yurii, 201 L Lalymenko, Olha, 464 Liakh, Ihor, 69 Lobachev, Ivan, 128 Lurie, Irina, 39 Lysa, Natalya, 177 Lysenko, Svitlana, 82 Lytvyn, Vasyl, 423 Lytvynenko, Luidmyla, 714 Lytvynenko, Volodymyr, 39, 314, 729 M Malets, Igor, 789 Mamenko, Pavlo, 252 Manziuk, Eduard, 491 Marasanov, Volodymyr, 230 Martsyshyn, Roman, 177, 201 Maryliv, Oleksandr, 54 Mashkov, Oleg, 297 Mashtalir, Sergii, 624 Mazurets, Olexander, 491 Mezentseva, Olga, 645 Mirnenko, Volodymyr, 413 Mishchanchuk, Maksym, 589 Mishkov, Oleksandr, 413, 602 Miyushkovych, Yuliya, 177, 201 Moldovan, Volodymyr, 82 Morgun, Igor, 602 Morozov, Viktor, 645 Mosin, Mikhailo, 519
Author Index N Nadraga, Vasiliy, 413 Nahrybelnyi, Yaroslav, 266 Naumov, Oleksandr, 729 Naumova, Olga, 729 Nazarkevych, Mariia, 573 Nosov, Pavlo, 252, 266 Novytskyy, Oleksandr, 464 O Ocheretianyi, Oleksandr, 776 Ohnieva, Oksana, 314 Oksana, Babilunha, 744 Olena, Arsirii, 744 Olszewski, Serge, 39 Omelchuk, Serhii, 449 P Pashko, Anatolii, 507 Pavlov, Sergii, 282 Pavlyshenko, Bohdan M., 479 Peredrii, Olena, 438 Perova, Iryna, 148, 464 Petrosiuk, Denys, 744 Petrushenko, Natalia, 519 Pikh, Iryna, 217 Pinaieva, Olga, 282 Polishchuk, Anatolii, 519 Polyvoda, Oksana, 349 Polyvoda, Vladyslav, 349, 613 Popereshnyak, Svitlana, 377 Popovych, Ihor, 252, 266 Povod, Yakiv, 331 Proskurin, Maksym, 645 R Radivilova, Tamara, 397 Riznyk, Oleg, 201 Rozov, Yuriy, 349 Rudakova, Antonina, 349 Rudakova, Hanna, 230, 349 Rudenko, Myhailo, 54 Ryshkovets, Yuriy, 423 Ryzhanov, Vitalii, 397 S Safonyk, Andrii, 589 Sarnatskyi, Vladyslav, 634 Savina, Nataliia, 729, 762 Semibalamut, Kostiantyn, 82 Senkivska, Nataliia, 217 Senkivskyy, Vsevolod, 217 Shafronenko, Alina, 613 Sharko, Artem, 230 Sharko, Marharyta, 519
Author Index Sharko, Oleksandr, 230 Sherstjuk, Volodymyr, 331 Sikora, Lubomyr, 177 Slonov, Mykhailo, 54 Smailova, Saule, 314 Smotr, Olga, 789 Spivakovsky, Aleksander, 449 Stankevich, Sergey, 548 Stelia, Oleg, 507 Stepanchikov, Dmitry, 230 Svideniuk, Mykhailo, 548 T Tatsij, Roman, 789 Tereshchenkova, Oksana, 94 Terletskyi, Dmytro, 665 Tkach, Mykola, 519 Tkachenko, Pavlo, 685 Tkachenko, Roman, 685, 696 Topolnytskyi, Maksym, 82 Tovstokoryi, Oleh, 266 Trofymchuk, Oleksandr, 3 Tsmots, Ivan, 201 Tupychak, Lyubov, 217 U Uzun, Illia, 128
805 V Vasylenko, Nataliia, 519 Velychko, Olga, 148 Voronenko, Mariia, 714, 729 Vyshemyrska, Svitlana, 423, 714, 729 Vysotska, Victoria, 423 W Wójcik, Waldemar, 297, 507 Y Yaremko, Dmitriy, 762 Yaremko, Svetlana, 762 Yarmolenko, Victor, 282 Yasinska-Damri, Lyudmyla, 69 Yershov, Sergey, 665 Yurchuk, Iryna, 563 Yurzhenko, Alona, 94 Z Zaets, Eva, 39 Zavgorodnii, Igor, 282, 464 Zaytseva, Tatyana, 94 Zharikova, Maryna, 331 Zhernova, Polina, 464 Zhuk, Sergiy, 82 Zinchenko, Serhii, 252, 266 Zorin, Kostiantyn, 602