402 12 19MB
English Pages XVII, 546 [525] Year 2020
Smart Innovation, Systems and Technologies 193
Ireneusz Czarnowski Robert J. Howlett Lakhmi C. Jain Editors
Intelligent Decision Technologies Proceedings of the 12th KES International Conference on Intelligent Decision Technologies (KES-IDT 2020)
Smart Innovation, Systems and Technologies Volume 193
Series Editors Robert J. Howlett, Bournemouth University and KES International, Shoreham-by-sea, UK Lakhmi C. Jain, Faculty of Engineering and Information Technology, Centre for Artificial Intelligence, University of Technology Sydney, Sydney, NSW, Australia
The Smart Innovation, Systems and Technologies book series encompasses the topics of knowledge, intelligence, innovation and sustainability. The aim of the series is to make available a platform for the publication of books on all aspects of single and multi-disciplinary research on these themes in order to make the latest results available in a readily-accessible form. Volumes on interdisciplinary research combining two or more of these areas is particularly sought. The series covers systems and paradigms that employ knowledge and intelligence in a broad sense. Its scope is systems having embedded knowledge and intelligence, which may be applied to the solution of world problems in industry, the environment and the community. It also focusses on the knowledge-transfer methodologies and innovation strategies employed to make this happen effectively. The combination of intelligent systems tools and a broad range of applications introduces a need for a synergy of disciplines from science, technology, business and the humanities. The series will include conference proceedings, edited collections, monographs, handbooks, reference books, and other relevant types of book in areas of science and technology where smart systems and technologies can offer innovative solutions. High quality content is an essential feature for all book proposals accepted for the series. It is expected that editors of all accepted volumes will ensure that contributions are subjected to an appropriate level of reviewing process and adhere to KES quality principles. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, SCOPUS, Google Scholar and Springerlink **
More information about this series at http://www.springer.com/series/8767
Ireneusz Czarnowski Robert J. Howlett Lakhmi C. Jain •
•
Editors
Intelligent Decision Technologies Proceedings of the 12th KES International Conference on Intelligent Decision Technologies (KES-IDT 2020)
123
Editors Ireneusz Czarnowski Gdynia Maritime University Gdynia, Poland
Robert J. Howlett KES International Research UK
Lakhmi C. Jain Faculty of Engineering and Information Technology Centre for Artificial Intelligence University of Technology Sydney Sydney, NSW, Australia Faculty of Science Liverpool Hope University Liverpool, UK KES International Shoreham-by-sea, UK
ISSN 2190-3018 ISSN 2190-3026 (electronic) Smart Innovation, Systems and Technologies ISBN 978-981-15-5924-2 ISBN 978-981-15-5925-9 (eBook) https://doi.org/10.1007/978-981-15-5925-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
KES-IDT 2020 Conference Organization
Honorary Chairs Lakhmi C. Jain, Liverpool Hope University, UK and University of Technology Sydney, Australia Gloria Wren-Phillips, Loyola University, USA
General Chair Ireneusz Czarnowski, Gdynia Maritime University, Poland
Executive Chair Robert J. Howlett, KES International & Bournemouth University, UK
Program Chair Jose L. Salmeron, University Pablo de Olavide, Seville, Spain Antonio J. Tallón-Ballesteros, University of Seville, Spain
v
vi
KES-IDT 2020 Conference Organization
Publicity Chair Izabela Wierzbowska, Gdynia Maritime University, Poland Alfonso Mateos Caballero, Universidad Politécnica de Madrid, Spain
Special Sessions Multi-Criteria Decision-Analysis—Theory and Their Applications Wojciech Sałabun, West Pomeranian University of Technology in Szczecin, Poland Intelligent Data Processing and its Applications Margarita N. Favorskaya, Reshetnev Siberian State University of Science and Technology, Russian Federation Lakhmi C. Jain., University of Technology Sydney, Australia Mikhail Sergeev, Saint Petersburg State University of Aerospace Instrumentation, Russian Federation High-Dimensional Data Analysis and Its Applications Mika Sato-Ilic, University of Tsukuba, Japan Decision Making Theory for Economics Takao Ohya, Kokushikan University, Japan Takafumi Mizuno, Meijo University, Japan Large-Scale Systems for Intelligent Decision Making and Knowledge Engineering Sergey Zykov, National Research University, Russia Decision Technologies and Related Topics in Big Data Analysis of Social and Financial Issues Mieko Tanaka-Yamawaki, Meiji University, Japan
International Program Committee Jair Minoro Abe, Paulista University and University of Sao Paulo, Brazil Witold Abramowicz, Poznan University of Economics, Poland Piotr Artiemjew, University of Warmia and Mazury, Poland Ahmad Taher Azar, Prince Sultan University, Kingdom of Saudi Arabia Valentina Emilia Balas, Aurel Vlaicu University of Arad, Romania Dariusz Barbucha, Gdynia Maritime University, Poland Andreas Behrend, University of Bonn, Germany Mokhtar Beldjehem, University of Ottawa, Canada Monica Bianchini, University of Siena, Italy Gloria Bordogna, CNR IREA, Italy
KES-IDT 2020 Conference Organization
vii
Leszek Borzemski, Wrocław University of Technology, Poland Janos Botzheim, Budapest University of Technology and Economics, Hungary Adriana Burlea-Schiopoiu, University of Craiova, Romania Alfonso Mateos Caballero, Universidad Politécnica de Madrid, Spain Frantisek Capkovic, Slovak Academy of Sciences, Slovak Republic Shyi-Ming Chen, National Taiwan University of Science and Technology, Taiwan Chen-Fu Chien, National Tsing Hua University, Taiwan Amine Chohra, Paris-East University (UPEC), France Marco Cococcioni, University of Pisa, Italy Angela Consoli, DST Group, Australia Paulo Cortez, University of Minho, Portugal Paolo Crippa, Università Politecnica delle Marche, Italy Dinu Dragan, University of Novi Sad, Serbia Margarita N. Favorskaya, Reshetnev Siberian State University of Science and Technology, Russia Raquel Florez-Lopez, University Pablo Olavide of Seville, Spain Wojciech Froelich, University of Silesia, Poland Rocco Furferi, University of Florence, Italy Mauro Gaggero, National Research Council, Italy Mauro Gaspari, University of Bologna, Italy Christos Grecos, National College of Ireland, Ireland Foteini Grivokostopoulou, University of Patras, Greece Jerzy W. Grzymala-Busse, University of Kansas, USA Katarzyna Harezlak, Silesian University of Technology, Poland Ioannis Hatzilygeroudis, University of Patras, Greece Dawn E. Holmes, University of California, USA Katsuhiro Honda, Osaka Prefecture University, Japan Tzung-Pei Hong, National University of Kaohsiung, Taiwan Dosam Hwang, Yeungnam University, South Korea Anca Ignat, Alexandru Ioan Cuza University, Romania Yuji Iwahori, Chubu University, Japan Piotr Jedrzejowicz, Gdynia Maritime University, Poland Dragan Jevtic, University of Zagreb, Croatia Ivan Jordanov, University of Portsmouth, UK Frank Klawonn, Ostfalia University, Germany Nikos Karacapilidis, University of Patras, Greece Pawel Kasprowski, Silesian University of Technology, Poland Roumen Kountchev, Technical University Sofia, Bulgaria Kazuhiro Kuwabara, Ritsumeikan University, Japan Aleksandar Kovačević, University of Novi Sad, Serbia Boris Kovalerchuk, Central Washington University, USA Marek Kretowski, Bialystok University of Technology, Poland
viii
KES-IDT 2020 Conference Organization
Vladimir Kurbalija, University of Novi Sad, Serbia Noriyuki Kushiro, Kyushu Institute of Technology, Japan Birger Lantow, University of Rostock, Germany Pei-Chun Lin, Feng Chia University, Taiwan Ivan Luković, University of Novi Sad, Serbia Mohamed Arezki Mellal, M’Hamed Bougara University, Algeria Mikhail Moshkov, KAUST, Saudi Arabia Tetsuya Murai, Chitose Institute of Science and Technology, Japan Marek Ogiela, AGH University of Science and Technology, Poland Yukio Ohsawa, The University of Tokyo, Japan Takao Ohya, Kokushikan University, Japan Mrutyunjaya Panda, Utkal University, India Georg Peters, Munich University of Applied Sciences, Germany Isidoros Perikos, University of Patras, Greece Petra Perner, Institut of Computer Vision and Applied Computer Sciences, Germany Gloria Phillips-Wren, Loyola University Maryland, USA Anitha S. Pillai, Hindustan Institute of Technology and Science, India Camelia Pintea, Technical University Cluj-Napoca, Romania Dilip Kumar Pratihar, Indian Institute of Technology Kharagpur, India Bhanu Prasad, Florida A&M University, USA Jim Prentzas, Democritus University of Thrace, Greece Radu-Emil Precup, Politehnica University of Timisoara, Romania Małgorzata Przybyła-Kasperek, University of Silesia, Poland Marcos Quiles, UNIFESP, Brazil Milos Radovanovic, University of Novi Sad, Serbia Azizul Azhar Ramli, Universiti Tun Hussein Onn, Malaysia Ewa Ratajczak-Ropel, Gdynia Maritime University, Poland Paolo Remagnino, University of Kingston, UK Ana Respício, University of Lisbon, Portugal Marina Resta, University of Genoa, Italy Alvaro Rocha, University of Coimbra, Portugal John Ronczka, Independent Research Scientist (SCOTTYNCC), Australia Anatoliy Sachenko, Ternopil National Economic University, Ukraine Wojciech Sałabun, West Pomeranian University of Technology in Szczecin, Poland Mika Sato-Ilic, University of Tsukuba, Japan Milos Savic, University of Novi Sad, Serbia Rainer Schmidt, Munich University of Applied Sciences, Germany Ralf Seepold, HTWG Konstanz, Germany Hirosato Seki, Osaka Institute of Technology, Japan
KES-IDT 2020 Conference Organization
Bharat Singh, Big Data Labs, Hamburg, Germany Margarita Stankova, New Bulgarian University, Bulgaria Aleksander Skakovski, Gdynia Maritime University, Poland Urszula Stanczyk, Silesian University of Technology, Poland Catalin Stoean, University of Craiova, Romania Ruxandra Stoean, University of Craiova, Romania Shing Chiang Tan, Multimedia University, Malaysia Claudio Turchetti, Università Politecnica delle Marche, Italy Mieko Tanaka-Yamawaki, Meiji University, Japan Dilhan J. Thilakarathne, VU University Amsterdam/ING Bank, Netherlands Edmondo Trentin, University of Siena, Italy Jeffrey Tweedale, DST Group, Australia Eiji Uchino, Yamaguchi University, Japan Marco Vannucci, Scuola Superiore Sant’Anna, Italy Zeev Volkovich, ORT Braude College, Israel Fen Wang, Central Washington University, USA Gloria Wren, Loyola University Maryland, USA Beata Konikowska, University of Silesia in Katowice, Poland Yoshiyuki Yabuuchi, Shimonoseki City University, Japan Hiroyuki Yoshida, Harvard Medical School, USA Dmitry Zaitsev, Odessa State Environmental University, Ukraine Lindu Zhao, Southeast University, China Min Zhou, Hunan University of Commerce, China Beata Zielosko, University of Silesia in Katowice, Poland Alfred Zimmermann, Reutlingen University, Germany Sergey Zykov, National Research University, Russia
ix
Preface
This volume contains the proceedings of the 12 International KES Conference on Intelligent Decision Technologies (KES-IDT 2020) being held as a Virtual Conference, in June 17–19, 2020. The KES-IDT is an international annual conference organized by KES International. The KES-IDT conference is a sub-series of the KES Conference series. The KES-IDT is an interdisciplinary conference and provides opportunities for the presentation of new research results and discussion about them, leading to knowledge transfer and generation of new ideas. This edition, KES-IDT 2020, attracted a number of researchers and practitioners from all over the world. The KES-IDT 2020 Programme Committee received papers for the main track and six special sessions. Each paper has been reviewed by 2–3 members of the International Program Committee and International Reviewer Board. Following a review process, only the highest quality submissions were accepted for inclusion in the conference. The 47 best papers have been selected for oral presentation and publication in the KES-IDT 2020 proceeding. We are very satisfied with the quality of the program and would like to thank the authors for choosing KES-IDT as the forum for presentation of their work. Also, we gratefully acknowledge the hard work of the KES-IDT international program committee members and of the additional reviewers for taking the time to review the submitted papers and selecting the best among them for presentation at the conference and inclusion in its proceedings. We hope and intend that KES-IDT 2020 significantly contributes to the fulfillment of the academic excellence and leads to even greater successes of KES-IDT events in the future. Gdynia, Poland Shoreham-by-sea, UK Sydney, Australia/Liverpool, UK/Shoreham-by-sea, UK June 2020
Ireneusz Czarnowski Robert J. Howlett Lakhmi C. Jain
xi
Contents
Main Track Solving Job Shop Scheduling with Parallel Population-Based Optimization and Apache Spark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Piotr Jedrzejowicz and Izabela Wierzbowska
3
Predicting Profitability of Peer-to-Peer Loans with Recovery Models for Censored Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Markus Viljanen, Ajay Byanjankar, and Tapio Pahikkala
15
Weighted Network Analysis for Computer-Aided Drug Discovery . . . . . Mariko I. Ito and Takaaki Ohnishi Manufacturing as a Service in Industry 4.0: A Multi-Objective Optimization Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gabriel H. A. Medeiros, Qiushi Cao, Cecilia Zanni-Merk, and Ahmed Samet Conservative Determinization of Translated Automata by Embedded Subset Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michele Dusi and Gianfranco Lamperti Explanatory Monitoring of Discrete-Event Systems . . . . . . . . . . . . . . . . Nicola Bertoglio, Gianfranco Lamperti, Marina Zanella, and Xiangfu Zhao
27
37
49 63
Artificial Intelligence Technique in Crop Disease Forecasting: A Case Study on Potato Late Blight Prediction . . . . . . . . . . . . . . . . . . . Gianni Fenu and Francesca Maridina Malloci
79
Optimal Quality-Based Recycling and Reselling Prices of Returned SEPs in IoT Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Siyu Zong, Sijie Li, and You Shang
91
xiii
xiv
Contents
On a Novel Representation of Multiple Textual Documents in a Single Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Nikolaos Giarelis, Nikos Kanakaris, and Nikos Karacapilidis Multi-agent Approach to the DVRP with GLS Improvement Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Dariusz Barbucha Intelligent Data Processing and Its Applications Detecting Relevant Regions for Watermark Embedding in Video Sequences Based on Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Margarita N. Favorskaya and Vladimir V. Buryachenko Artificial Neural Network in Predicting Cancer Based on Infrared Spectroscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Yaniv Cohen, Arkadi Zilberman, Ben Zion Dekel, and Evgenii Krouk Evaluation of Shoulder Joint Data Obtained from CON-TREX Medical System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 Aleksandr Zotin, Konstantin Simonov, Evgeny Kabaev, Mikhail Kurako, and Alexander Matsulev Framework for Intelligent Wildlife Monitoring . . . . . . . . . . . . . . . . . . . 167 Valery Nicheporchuk, Igor Gryazin, and Margarita N. Favorskaya Computation the Bridges Earthquake Resistance by the Grid-Characteristic Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Alena Favorskaya Study the Elastic Waves Propagation in Multistory Buildings, Taking into Account Dynamic Destruction . . . . . . . . . . . . . . . . . . . . . . . 189 Alena Favorskaya and Vasily Golubev Icebergs Explosions for Prevention of Offshore Collision: Computer Simulation and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Alena Favorskaya and Nikolay Khokhlov Genetic Operators Impact on Genetic Algorithms Based Variable Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Marco Vannucci, Valentina Colla, and Silvia Cateni Symmetry Indices as a Key to Finding Matrices of Cyclic Structure for Noise-Immune Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Alexander Sergeev, Mikhail Sergeev, Nikolaj Balonin, and Anton Vostrikov
Contents
xv
Search and Modification of Code Sequences Based on Circulant Quasi-orthogonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Alexander Sergeev, Mikhail Sergeev, Vadim Nenashev, and Anton Vostrikov Processing of CT Lung Images as a Part of Radiomics . . . . . . . . . . . . . 243 Aleksandr Zotin, Yousif Hamad, Konstantin Simonov, Mikhail Kurako, and Anzhelika Kents High-Dimensional Data Analysis and Its Applications Symbolic Music Text Fingerprinting: Automatic Identification of Musical Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 Michele Della Ventura Optimization of Generalized Cp Criterion for Selecting Ridge Parameters in Generalized Ridge Regression . . . . . . . . . . . . . . . . . . . . . 267 Mineaki Ohishi, Hirokazu Yanagihara, and Hirofumi Wakaki A Fast Optimization Method for Additive Model via Partial Generalized Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Keisuke Fukui, Mineaki Ohishi, Mariko Yamamura, and Hirokazu Yanagihara Improvement of the Training Dataset for Supervised Multiclass Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Yukako Toko and Mika Sato-Ilic A Constrained Cluster Analysis with Homogeneity of External Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Masao Takahashi, Tomoo Asakawa, and Mika Sato-Ilic Trust-Region Strategy with Cauchy Point for Nonnegative Tensor Factorization with Beta-Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 Rafał Zdunek and Krzysztof Fonał Multi-Criteria Decision Analysis—Theory and Their Applications Digital Twin Technology for Pipeline Inspection . . . . . . . . . . . . . . . . . . 329 Radda A. Iureva, Artem S. Kremlev, Vladislav Subbotin, Daria V. Kolesnikova, and Yuri S. Andreev Application of Hill Climbing Algorithm in Determining the Characteristic Objects Preferences Based on the Reference Set of Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 Jakub Więckowski, Bartłomiej Kizielewicz, and Joanna Kołodziejczyk
xvi
Contents
The Search of the Optimal Preference Values of the Characteristic Objects by Using Particle Swarm Optimization in the Uncertain Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Jakub Więckowski, Bartłomiej Kizielewicz, and Joanna Kołodziejczyk Finding an Approximate Global Optimum of Characteristic Objects Preferences by Using Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . 365 Jakub Więckowski, Bartłomiej Kizielewicz, and Joanna Kołodziejczyk Large-Scale Systems for Intelligent Decision-Making and Knowledge Engineering Digital Systems for eCTD Creation on the Pharmaceutical Market of the Eurasian Economic Union . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Konstantin Koshechkin, Georgy Lebedev, and Sergey Zykov Scientific Approaches to the Digitalization of Drugs Assortment Monitoring Using Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . 391 Tsyndymeyev Arsalan, Konstantin Koshechkin, and Georgy Lebedev The Geographic Information System of the Russian Ministry of Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Georgy Lebedev, Alexander Polikarpov, Nikita Golubev, Elena Tyurina, Alexsey Serikov, Dmitriy Selivanov, and Yuriy Orlov Creation of a Medical Decision Support System Using Evidence-Based Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 Georgy Lebedev, Eduard Fartushniy, Igor Shaderkin, Herman Klimenko, Pavel Kozhin, Konstantin Koshechkin, Ilya Ryabkov, Vadim Tarasov, Evgeniy Morozov, Irina Fomina, and Gennadiy Sukhikh Improve Statistical Reporting Forms in Research Institutions of the Health Ministry of Russia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429 Georgy Lebedev, Oleg Krylov, Andrey Yuriy, Yuriy Mironov, Valeriy Tkachenko, Eduard Fartushniy, and Sergey Zykov Chat-Based Approach Applied to Automatic Livestream Highlight Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Pavel Drankou and Sergey Zykov Decision Technologies and Related Topics in Big Data Analysis of Social and Financial Issues What Are the Differences Between Good and Poor User Experience? . . . 451 Jun Iio Two-Component Opinion Dynamics Theory of Official Stance and Real Opinion Including Self-Interaction . . . . . . . . . . . . . . . . . . . . . 461 Nozomi Okano, Yuki Ohira, and Akira Ishii
Contents
xvii
Theory of Opinion Distribution in Human Relations Where Trust and Distrust Mixed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 Akira Ishii and Yasuko Kawahata Is the Statistical Property of the Arrowhead Price Fluctuation Time Dependent? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479 Mieko Tanaka-Yamawaki and Masanori Yamanaka Decision-Making Theory for Economics History of Mechanization and Organizational Change in the Life Insurance Industry in Japan (Examples from Dai-ichi Life, Nippon Life, Imperial Life, Meiji Life) . . . . . . . . . . . . . . . . . . . . . . . . . . 489 Shunei Norikumo A Diagram for Finding Nash Equilibria in Two-Players Strategic Form Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 Takafumi Mizuno An AHP Approach for Transform Selection in DHCT-Based Image Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509 Masaki Morita, Yuto Kimura, and Katsu Yamatani SPCM with Improved Two-Stage Method for MDAHP Including Hierarchical Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 Takao Ohya Tournament Method Using a Tree Structure to Resolve Budget Conflicts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 Natsumi Oyamaguchi, Hiroyuki Tajima, and Isamu Okada Upgrading the Megacity Piloting a Co-design and Decision Support Environment for Urban Development in India . . . . . . . . . . . . . . . . . . . . 533 Jörg Rainer Noennig, David Hick, Konstantin Doll, Torsten Holmer, Sebastian Wiesenhütter, Chiranjay Shah, Palak Mahanot, and Chhavi Arya Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
Main Track
Solving Job Shop Scheduling with Parallel Population-Based Optimization and Apache Spark Piotr Jedrzejowicz and Izabela Wierzbowska
Abstract The paper proposes an architecture for the population-based optimization in which Apache Spark is used as a platform enabling parallelization of the process of search for the best solution. The suggested architecture, based on the A-Team concept, is used to solve the Job Shop Scheduling Problem (JSP) instances. Computational experiment is carried out to compare the results from solving a benchmark set of the problem instances obtained using the proposed approach with other, recently reported, results.
1 Introduction One of the possible approaches to solving difficult optimization problems is applying the population-based metaheuristics. In these methods, population consists of feasible solutions or part of solutions or objects from which a solution can be easily constructed. Such population, in the course of search for the best solution, can be transformed, filtered, expanded by adding new solutions, and reduced removing some less promising ones. When solving a real-life hard optimization problems, the search space tends to be huge, which requires that a big number of candidate solutions needs to be evaluated and considered. Population-based metaheuristics use on two basic classes of operations when searching for the best solution. These are intensification and diversification also referred to as exploitation and exploration. Diversification can be described as genP. Jedrzejowicz · I. Wierzbowska (B) Department of Information Systems, Gdynia Maritime University, Morska 81-83 81-225 Gdynia, Poland e-mail: [email protected] P. Jedrzejowicz e-mail: [email protected]
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_1
3
4
P. Jedrzejowicz and I. Wierzbowska
erating diverse solutions to explore the search space, and intensification as focusing the search in a local region by exploiting the information that a current good solution is found in this region. Because of the usual huge size of the solution space, there is a need for selection. Iterative selection of the best-fitted individuals is a strategy expected to assure convergence to the optimality. Population-based metaheuristics can be broadly categorized into two main classes: based on evolutionary paradigm and based on the paradigm of the swarm intelligence. Reviews covering population-based metaheuristics can be found, among other, in [4, 11]. In this paper, we propose applying the A-team paradigm to construct a multipleagent system solving instances of the job shop scheduling problem. A-team concept proposed in [23] assumes that a number of optimization agents cooperates by sharing access to common memory containing population of solutions. Agents work asynchronously and in parallel trying to improve solutions from the common memory. Each agent works in cycles which consist of three basic steps: acquiring a solution from the common memory, attempting its improvement, and afterward sending it back to the common memory. A special agent called solution manager implements user strategy with respect to selection, replacement, and evaluation of solutions in the common memory. To implement the A-team structure and to carry on the population-based optimization, we use the Apache Spark which is a well-known engine for processing massive data in parallel. The rest of the paper is organized as follows. Section 2 contains a brief review of the related work—in Sect. 3 the Job Shop Scheduling Problem (JSP) is defined. Section 4 gives details of the proposed implementation, in which the specialized A-team solving instances of the job shop scheduling problem is constructed using the Apache Spark environment. Section 5 presents assumptions and results of the computational experiment carried out to validate the approach. Section 6 concludes the paper.
2 Population-Based Approaches to Solving Difficult Optimization Problems Population-based algorithms are usually classified as metaheuristics. Metaheuristics are algorithms producing approximate solutions to a wide range of complex optimization problems. Metaheuristics are not problem-specific, although their parameters have to be fitted to particular problems. Pioneering work in the area of the population-based methods included genetic programming [15], genetic algorithms [8], evolutionary computations [6, 16], ant colony optimization [5], particle swarm optimization [14], and bee colony algorithms [19]. Since the beginning of the current century, research interest in the population-based optimization has been dynamically growing. Excellent overview of the population-
Solving Job Shop Scheduling with Parallel Population-Based Optimization …
5
based metaheuristics can be found in [4]. Current trends in the population-based optimization were discussed in [11]. One of the most promising recent approaches to constructing effective populationbased algorithms is using the paradigm of ensemble and hybrid computations. The main assumption behind these methods is that multiple and different search operators, neighborhood structures, and computation controlling procedures run, possibly, in parallel, may produce a synergetic effect. A review of ensemble methods can be found in [24]. To implement ensemble and hybrid solutions, specialized software frameworks are constructed. They include multi-agent systems allowing for asynchronous and parallel computations. A detailed review of the population-based optimization frameworks and multi-agent systems can be found in [21]. The idea to use a multi-agent structure, where agents represent different optimization procedures and share a common memory where evolving solutions are stored, was previously used in JABAT [3]. Teams of agents, together improving solutions of a problem, may produce a synergetic effect. In fact, such phenomenon was experimentally demonstrated in [12] where the computational experiment of solving the Travelling Salesman Problem (TSP) was reported. Recently, several papers dealing with applying the parallel processing paradigm for implementing population-based metaheuristics have been reported. In [2, 9] reviews of the current research efforts are given. Apache Spark has been previously used in the population-based optimization. In [13] it has been used to process the population of solutions of TSP, each solution improved by a set of agents in a separate process run in parallel using the Apache Spark. In [22] a distributed, cooperative, evolutionary algorithm has been proposed and solved using the Apache Spark platform. In the above approach, optimization procedures are applied in parallel to several sub-populations. In [22] the approach consists of applying particle swarm optimization, genetic algorithm, and local search, applied in this order, in a loop. The solution has been tested on Flexible Job Shop Scheduling. In [18] the distributed simulated annealing with MapReduce was proposed. The current paper presents a parallel computational algorithm for solving the job shop scheduling problem. The approach displays some similarities to the idea used in JABAT [3]. We propose to use the Apache Spark middleware, in a manner resulting in a simpler and more straightforward architecture as compared with the one used in JABAT [3]. The paper is constructed as follows: in Sect. 3 the job shop scheduling problem is formulated. Section 4 contains a description of the Spark-based Parallel Population-Based Algorithm for solving the job shop scheduling problem. Section 5 shows results of the computational experiment. The final section contains conclusions and ideas for future research.
6
P. Jedrzejowicz and I. Wierzbowska
3 Job Shop Scheduling Job shop scheduling problem (JSP) is a well-known NP-hard optimization problem. There is a set of jobs (J1 , ...Jn ), and a set of machines (m s , ...m m ). Each job j consists of a list of operations that have to be processed in given order. Each operation must be processed on a specific machine, only after all preceding operations of the job have been completed. The makespan is the total length of a schedule, i.e., the total time needed to process all operations of all jobs. In JSP the objective is to find a schedule that minimizes the makespan. A solution may be represented as sequence of jobs of the length n × m. There are m occurrences of each job in this sequence. When examining the sequence from left to right, the ith occurrence of the job j refers to the ith operation of this job. Such representation has been used for the algorithm design and for experiments validating the proposed approach.
4 Parallel Population-Based Optimization in Spark To solve the JSP problem, the population-based optimization paradigm is used. The idea is to employ autonomous agents improving solutions from the common memory. Common memory contains a population of solutions. Agents belong to the available finite set of agents. Agents act independently. Each agent is designed to be able to read one or two solutions from the population and try to produce a new, improved, solution. If such an attempt is successful the improved solution is added to the common memory. Quality of solutions, in terms of the optimization criterion, stored in the common memory is expected to gradually grow. The process of solving the JSP problem starts with creating the initial set of randomly generated solutions and storing them in the common memory. To parallelize the optimization process, the memory is randomly partitioned into a number of subpopulations containing solutions, which are optimized by agents in a separate threads of the Apache Spark. The process works in rounds. In each round, in each parallel thread, solutions stored in the corresponding population are improved by optimizing agents. After a given time span, solutions from all sub-populations are grouped together and, eventually, transformed, by replacing some less promising solutions with new, random ones. Next, the whole cycle is repeated. Common memory is divided into sub-populations, and the next round of optimization starts, with sub-populations consisting of different subsets of solutions. The program works in a loop as shown in Algorithm 1. The stopping criterion is defined as no improvement the best makespan for a given number of iterations.
Solving Job Shop Scheduling with Parallel Population-Based Optimization …
7
Algorithm 1: The program solutions = set of random solutions while ! stoppingCriterion do solutions = solutions.optimi ze best Solution = solution from solutions with the best makespan 5 return best Solution
1 2 3 4
The optimalization in optimi ze procedure is shown as Algorithm 2. Populations are parallelized, and the RDD (Resilient Distributed Dataset in Apache Spark) of populations is created. The RDD is transformed with the function apply O ptimi zations that is responsible for optimizing a single population of solutions. The function uses optimizing agents for a predefined time span. Then all the solutions from populations R D D are collected, and a given percentage of the worst ones is replaced by newly generated random solutions.
Algorithm 2: Method optimi ze 1 2 3 4 5
populations = solutions divided to a list of k-element populations populations R D D = populations parallelized in Apache Spark populations R D D = populations R D D.map( p => p.apply O ptimi zations) solutions = solutions collected from populations R D D replace the worst solutions in solutions with random solutions
Optimization procedures are carried out by a number of specially designed simple agents. There are two types of such agents—the single argument type and the doubleargument type. Single argument agents work on a single solution. Double-argument agents require two solutions as an input. They try to improve the worse solution out of the two considered. The resulting improved solution has to be different from the solution used as the second argument. If the agent cannot find a better solution, it returns N one. Agents are called by apply O ptimi zations procedure. They are drawn at random and applied to random solutions from the population. If an agent can produce an improved solution, it replaces a worse solution in the population. It is possible to limit the number of trials by each agent or limit the time span in which the function apply O ptimi zations remains active (in any round). There are two kinds of the single argument agent: – pairwise exchange—the agent creates new solution by interchanging operations in two randomly chosen positions of an old solution; – 3Opt—new solution is the best of solutions selected from all permutations of operations in three randomly chosen positions of the original solution.
8
P. Jedrzejowicz and I. Wierzbowska
There are three following kinds of the double-argument agent: – BH—one solution is gradually transformed to another by locating positions with different operations and swapping operations in such a way that the number of positions with different operations decreases. This is done until the solutions are identical (inspired by [10]). – HS—a number of positions is drawn in random, at each position an operation is taken from the first or the second argument (with probability 50%). The procedure is supplemented with selecting the missing operations at random (based on Harmony search algorithm [7]). – Relinking—a slice s of one solution is taken; the part is then supplemented with the missing operations in the order in which they occur in the second solution. This is done for s of different lengths.
5 Experiment Results There are many benchmark instances of the JSP with known optimal solutions (minimum makespan). In experiments reported in the present section, problem instances introduced in [17] are used. A series of experiments has been carried out to validate the proposed approach. Each problem instance has been solved 20 times. A number of easier instances from [17] have been solved in Spark local mode in a computer with 16 GB RAM and Core i7-8700 processor, with the use of only one or two parallel threads. Other instances have been solved on Spark cluster with eight nodes with 32 VCPU at the Academic Computer Centre in Gdask. The stopping criterion has been defined as no change in the best makespan for the given number of iterations. Running time of each iteration has been limited by the given number of seconds. Some of the settings used in the experiment for each instance are shown in Table 1. The other settings are as follows: each population consists of eight solutions, in each iteration after collecting solutions the worst 25% of them is replaced by random solutions. The table also contains results of the computations. In the second column, the makespan of the best known solution (BKS) is given. The last two columns present the average relative error and the average running time in second, obtained in 20 runs of the algorithm with the given settings. The results of the experiments with the setting in Table 1 are compared (Table 2) to some results published in two recent papers, [20] using Memetic chicken swarm algorithm and [1] with a PSO algorithm with genetic and neighborhood-based diversity operators. Again, for each instance the makespan of the best known solutions (BKS) is given. The third and fourth columns present average makespan and average error, repeated after Table 1, of the algorithm presented in this paper, denoted as PboS. In the next columns, the average result and relative errors are given for each cited method. Errors for [20] have been calculated from the information available in the paper.
Solving Job Shop Scheduling with Parallel Population-Based Optimization …
9
Table 1 Results for some settings used in the computations for la instances Instance
BKS
Mode
Number of threads
One iteration running time (s)
Stopping criterion
Err (%)
Average running time (s)
la01
666
Local
1
2
1
0.00
4.3
la02
655
Cluster
10
10
5
0.00
64.9
la03
597
Cluster
50
10
5
0.00
73.6
la04
590
Cluster
10
10
5
0.00
72.5
la05
593
Local
2
1
1
0.00
4
la06
926
Local
1
1
1
0.00
2
la07
890
Local
2
5
1
0.00
21.0
la08
863
Local
2
1
1
0.00
5.05
la09
951
Local
1
1
1
0.00
2
la10
958
Local
1
1
1
0.00
2
la11
1222
Local
1
2
1
0.00
4.1
la12
1039
Local
1
1
1
0.00
2.1
la13
1150
Local
1
1
1
0.00
2.35
la14
1292
Local
1
1
1
0.00
2
la15
1207
Cluster
10
10
5
0.00
72.5
la16
945
Cluster
200
10
30
0.08
416
la17
784
Cluster
200
10
30
0.00
392
la18
848
Cluster
200
10
30
0.00
374
la19
842
Cluster
200
10
30
0.35
541
la20
902
Cluster
200
10
30
0.55
334
la21
1046
Cluster
200
10
30
2.34
1037
la22
927
Cluster
200
10
30
1.78
947
la23
1032
Cluster
200
10
30
0.00
504
la24
935
Cluster
200
10
30
3.34
947
la25
977
Cluster
200
10
30
2.69
1085
la26
1218
Cluster
200
10
30
0.77
1917
la27
1235
Cluster
200
10
30
4.33
1915
la28
1216
Cluster
200
10
30
3.24
1460
la29
1152
Cluster
200
10
30
6.87
1522
la30
1355
Cluster
200
10
30
0.55
1566
la31
1784
Cluster
200
10
30
0.00
1049
la32
1850
Cluster
200
10
30
0.00
1504
la33
1719
Cluster
200
10
30
0.00
1403
la34
1721
Cluster
200
10
30
0.81
2555
la35
1888
Cluster
200
10
30
0.05
1770
la36
1268
Cluster
200
10
30
4.65
1404
la37
1397
Cluster
200
10
30
5.31
1311
la38
1196
Cluster
200
10
30
6.70
1619
la39
1233
Cluster
200
10
30
3.86
1398
la40
1222
Cluster
200
10
30
3.64
1546
10
P. Jedrzejowicz and I. Wierzbowska
Table 2 Comparison of the results for la instances Instance
BKS
PboS
Err (%)
MeCSO [20]
Err (%)
PSO-NGO [1]
Err (%)
la01
666
666
0.00
666
0.00
666
0.00
la02
655
655
0.00
655
0.00
655
0.00
la03
597
597
0.00
599
0.34
597
0.00
la04
590
590
0.00
590
0.00
590
0.00
la05
593
593
0.00
593
0.00
593
0.00
la06
926
926
0.00
926
0.00
926
0.00
la07
890
890
0.00
890
0.00
890
0.00
la08
863
863
0.00
863
0.00
862
0.00
la09
951
951
0.00
951
0.00
951
0.00
la10
958
958
0.00
958
0.00
958
0.00
la11
1222
1222
0.00
1222
0.00
1222
0.00
la12
1039
1039
0.00
1039
0.00
1039
0.00
la13
1150
1150
0.00
1150
0.00
1150
0.00
la14
1292
1292
0.00
1292
0.00
1292
0.00
la15
1207
1207
0.00
1207
0.00
1207
0.00
la16
945
946
0.08
950
0.53
945
0.00
la17
784
784
0.00
784
0.00
784
0.00
la18
848
848
0.00
851
0.35
848
0.00
la19
842
845
0.35
850
0.95
842
0.00
la20
902
907
0.55
911
1.00
902
0.00
la21
1046
1070
2.34
1085
3.73
1046
0.00
la22
927
943
1.78
na
na
927
0.00
la23
1032
1032
0.00
na
na
1032
0.00
la24
935
966
3.34
na
na
935
0.00
la25
977
1003
2.69
na
na
977
0.00
la26
1218
1227
0.77
na
na
1218
0.00
la27
1235
1288
4.33
na
na
1235
0.00
la28
1216
1255
3.24
na
na
1216
0.00
la29
1152
1231
6.87
na
na
1164
1.040
la30
1355
1362
0.55
na
na
1355
0.00
la31
1784
1784
0.00
na
na
1784
0.00
la32
1850
1850
0.00
na
na
1850
0.00
la33
1719
1719
0.00
na
na
1719
0.00
la34
1721
1735
0.81
na
na
1719
0.00
la35
1888
1889
0.05
na
na
1888
0.00
la36
1268
1327
4.65
na
na
1268
0.00
la37
1397
1471
5.31
na
na
1397
0.00
la38
1196
1276
6.70
na
na
1196
0.00
la39
1233
1281
3.86
na
na
1233
0.00
la40
1222
1266
3.64
na
na
1224
0.16
Solving Job Shop Scheduling with Parallel Population-Based Optimization …
11
Fig. 1 The impact of the number of threads and the running time of each iteration on the results
In [1], no information on the time was needed to find the given results. MeCSO algorithm from [20] used very short time for smaller tasks, for example, la01 was solved in 1 s, la02 in 21 s. For bigger tasks, however, it needed more time than PboS presented in this paper. For example, the solution for la20 was produced by MeCSO in 782 s, while the result of PboS for the same task was achieved after 337 s. Figure 1 presents how increasing the number of threads may improve the result and total running time. It has been shown for one instance solved in local mode (la05) and one run on the cluster (la03).
6 Conclusion The paper proposes application of the Apache Spark middleware for solving instances of the Job Shop Scheduling Problem. The idea is to construct the optimization system using the population-learning and the team of agent paradigms. It was shown that such system consisting of common memory and a set of autonomous optimization agents working in parallel can be used to solve instances of the JSP assuring highquality results at a competitive computation time. It may be used as a platform in which population-based optimization is used to solve in parallel difficult optimization problems. The proposed architecture is universal and can be easily adapted for solving instances of the combinatorial optimization problems. Future research will focus on investigating rationale for developing different heuristics of different complexity levels to be used by agents trying to improve solutions. It is also expected that introducing more sophisticated measures to manage population of solutions in the common memory with respect to selection and renewal policy could increase effectiveness of the approach.
12
P. Jedrzejowicz and I. Wierzbowska
References 1. Abdel-Kader, R.F.: An improved pso algorithm with genetic and neighborhood-based diversity operators for the job shop scheduling problem. Appl. Artif. Intell. 32(5), 433–462 (2018). https://doi.org/10.1080/08839514.2018.1481903 2. Alba, E., Luque, G., Nesmachnow, S.: Parallel metaheuristics: recent advances and new trends. Int. Trans. Oper. Res. 20(1), 1–48 (2013) https://doi.org/10.1111/j.1475-3995.2012.00862.x 3. Barbucha, D., Czarnowski, I., Jdrzejowicz, P., Ratajczak-Ropel, E., Wierzbowska, I.: JABAT Middleware as a Tool for Solving Optimization Problems, pp. 181–195. Springer, Berlin, Heidelberg, Berlin, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17155-0_10 4. Boussaid, I., Lepagnot, J., Siarry, P.: A survey on optimization metaheuristics. Inf. Sci. 237, 82 – 117 (2013). Prediction, Control and Diagnosis using Advanced Neural Computations 5. Dorigo, M., Maniezzo, V., Colorni, A.: Ant system: optimization by a colony of cooperating agents. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 26(1), 29–41 (1996) 6. Fogel, D.: Evolutionary Computation: Toward a New Philosophy of Machine Intelligence, vol. 1. IEEE Press piscataway NJ (01 1995) 7. Geem, Z.W., Kim, J., Loganathan, G.: A new heuristic optimization algorithm: harmony search. Simul. 76, 60–68 (02 2001) 8. Goldberg, D.E.: Genetic Algorithms in Search Optimization and Machine Learning, 1st edn. Addison-Wesley Longman Publishing Co. Inc., Boston, MA, USA (1989) 9. González, P., Pardo Martínez, X., Doallo, R., Banga, J.: Implementing cloud-based parallel metaheuristics: an overview. J. Comput. Sci. Technol. 18(03), e26 (2018). http://journal.info. unlp.edu.ar/JCST/article/view/1109 10. Hatamlou, A.: Solving travelling salesman problem using black hole algorithm. Soft Comput. 22 (2017) 11. Jedrzejowicz, P.: Current trends in the population-based optimization. In: Nguyen, N.T., Chbeir, R., Exposito, E., Aniorté, P., Trawi´nski, B. (eds.) Computational Collective Intelligence, pp. 523–534. Springer International Publishing, Cham (2019) 12. Jedrzejowicz, P., Wierzbowska, I.: Experimental investigation of the synergetic effect produced by agents solving together instances of the euclidean planar travelling salesman problem. In: Jedrzejowicz, P., Nguyen, N.T., Howlet, R.J., Jain, L.C. (eds.) Agent and Multi-Agent Systems: Technologies and Applications, pp. 160–169. Springer, Berlin, Heidelberg (2010) 13. Jedrzejowicz, P., Wierzbowska, I.: Apache spark as a tool for parallel population-based optimization. In: Czarnowski, I., Howlett, R.J., Jain, L.C. (eds.) Intelligent Decision Technologies 2019, pp. 181–190. Springer Singapore, Singapore (2020) 14. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of ICNN’95— International Conference on Neural Networks. vol. 4, pp. 1942–1948 (1995) 15. Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA (1992) 16. Michalewicz, Z.: Genetic Algorithm+Data Structures=Evolution Programs. Springer, Berlin, Heidelberg (1996) 17. Lawrence, S.R.: Resource constrained project scheduling-a computational comparison of heuristic techniques (1985) 18. Radenski, A.: Distributed simulated annealing with mapreduce. In: Di Chio, C., Agapitos, A., Cagnoni, S., Cotta, C., de Vega, F.F., Di Caro, G.A., Drechsler, R., Ekárt, A., EsparciaAlcázar, A.I., Farooq, M., Langdon, W.B., Merelo-Guervós, J.J., Preuss, M., Richter, H., Silva, S., Simões, A., Squillero, G., Tarantino, E., Tettamanzi, A.G.B., Togelius, J., Urquhart, N., Uyar, A.S., ¸ Yannakakis, G.N. (eds.) Applications of Evolutionary Computation, pp. 466–476. Springer, Berlin, Heidelberg (2012) 19. Sato, T., Hagiwara, M.: Bee system: finding solution by a concentrated search. In: 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation. vol. 4, pp. 3954–3959 (1997) 20. Semlali, S., Riffi, M., Chebihi, F.: Memetic chicken swarm algorithm for job shop scheduling problem. Int. J. Electr. Comput. Eng. (IJECE) 9, 2075 (2019)
Solving Job Shop Scheduling with Parallel Population-Based Optimization …
13
21. Silva, M.A.L., de Souza, S.R., Souza, M.J.F., de França Filho, M.F.: Hybrid metaheuristics and multi-agent systems for solving optimization problems: a review of frameworks and a comparative analysis. Appl. Soft Comput. 71, 433–459 (2018). http://www.sciencedirect.com/ science/article/pii/S1568494618303867 22. Sun, L., Lin, L., Lib, H., Genc, M.: Large scale flexible scheduling optimization by a distributed evolutionary algorithm. Comput. Ind. Eng. 128 (2018) 23. Talukdar, S., Baerentzen, L., Gove, A., De Souza, P.: Asynchronous teams: cooperation schemes for autonomous agents. J. Heuristics 4(4), 295–321 (1998). https://doi.org/10.1023/ A:1009669824615 24. Wu, G., Mallipeddi, R., Suganthan, P.: Ensemble strategies for population-based optimization algorithms—a survey. Swarm Evolut. Comput. 44, 695–711 (2019)
Predicting Profitability of Peer-to-Peer Loans with Recovery Models for Censored Data Markus Viljanen, Ajay Byanjankar, and Tapio Pahikkala
Abstract Peer-to-peer lending is a new lending approach gaining in popularity. These loans can offer high interest rates, but they are also exposed to credit risk. In fact, high default rates and low recovery rates are the norms. Potential investors want to know the expected profit in these loans, which means they need to model both defaults and recoveries. However, real-world data sets are censored in the sense that they have many ongoing loans, where future payments are unknown. This makes predicting the exact profit in recent loans particularly difficult. In this paper, we present a model that works for censored loans based on monthly default and recovery rates. We use the Bondora data set, which has a large amount of censored and defaulted loans. We show that loan characteristics predicting lower defaults and higher recoveries are usually, but not always, similar. Our predictions have some correlation with the platform’s model, but they are substantially different. Using a more accurate model, it is possible to select loans that are expected to be more profitable. Our model is unbiased, with a relatively low prediction error. Experiments in selecting portfolios of loans with lower or higher Loss Given Default (LGD) demonstrate that our model is useful, whereas predictions based on the platform’s model or credit ratings are not better than random.
1 Introduction 1.1 What Is Peer-to-Peer Lending? Peer-to-peer (P2P) lending is a practice of lending money from potentially many individuals to a borrower, usually through an online platform that connects potential M. Viljanen (B) · T. Pahikkala University of Turku, Turku, Finland e-mail: [email protected] A. Byanjankar Åbo Akademi University, Turku, Finland © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_2
15
16
M. Viljanen et al.
lenders and borrowers. As the practice is gaining popularity, many different platforms have been set up by private companies. The borrowers apply for a loan by filling a loan application, and lenders then bid for the loans by offering an interest rate. Modelbased decisions are a key factor in this process, because the lenders review the loans available and use this information to decide whom to offer loans to and at what price. Many benefits of P2P lending have been advertised. For the borrower, they center around easy access to credit. The digitized loan process has been claimed to be faster and easier, and the loans are claimed to offer competitive interest rates and credit for individuals that may not qualify for traditional loans [1, 2]. For the lender, the attraction is possibly high investment returns and diversification through investing in many loans. However, a major drawback is that the loans typically have no collateral and defaults are quite common. This means that the loans have credit risk, which the lender needs to compensate for when setting the interest rate. The interest rates are determined by the supply and demand of loans, and implicits in these rates are assumptions about the default risk and the loss given default (LGD). Many of the lenders are not professional investors, and offering the appropriate interest rate is a challenging problem [3]. The platforms often help by providing a rating model that classifies loans by credit risk. However, in practice the borrowers would like to estimate the expected profit of a loan. The profit can be calculated with discounted cashflow (DCF) analysis if we know two quantities: the default risk and the LGD. This is a challenging problem because many loans are censored, which means that they were issued recently and are still ongoing. How can we calculate the profit when we don’t know the future payments in the training set? Excluding ongoing loans would create bias, because it removes the loans that are more likely to survive. Assuming future payments are made in full or not at all creates an optimistic or pessimistic bias. It is possible to avoid bias by limiting the analysis to old loans that have had the possibility to be observed in full. For example, if the maximum loan duration is 5 years, it is possible to create an unbiased data set by taking only loans that were issued over 5 years ago. However, models trained on such old data may not accurately predict the current profits, because the marketplace and the quality of borrowers could have changed. In this paper, we present a novel model for loss given default (LGD), which is unbiased by censored data. We do this by modeling the monthly recovery rates, which we discount to present value using DCF analysis. This extends our previous work [11], which assumed a constant default rate and relied on Bondora’s own LGD calculation.
Predicting Profitability of Peer-to-Peer Loans with Recovery Models …
17
2 Literature Review 2.1 Predicting the Profitability Peer-to-Peer Loans The growth in P2P lending has seen an increasing number of academic contributions, with approaches including statistical and machine learning techniques. The primary motivation has been to analyze the default risk with widely used techniques such as credit scoring. In addition, a limited number of recent studies have presented extensions with applications of profit scoring and LGD. It is essential to be able to model the LGD for profit scoring purposes, and it is an integral part of credit risk analysis. Credit scoring is a widely used technique for evaluating credit risk. Machine learning has been a popular method of developing credit scoring models in P2P lending. Emekter et al. [4] and Lin [5] applied logistic regression to train credit scoring models for default and identify its determinants in P2P lending. Random forest was applied in [6] for identifying good borrowers and it outperformed other classifiers such as logistic regression, support vector machine, and the P2P platform baseline. Byanjankar et al. [7] performed classification of loans in P2P lending with neural network for predicting defaults. In addition, cost-sensitive approach with gradient boosting trees was applied in [8], where evaluation was based on annualized rate of return. Survival analysis has been implemented to study relation between borrowers’ features and their default risk in P2P lending. It was applied in [9, 10] for identifying the features that explain a higher default risk in P2P lending. In extension to credit scoring, some studies have applied profit scoring in P2P lending to provide directly actionable results to lenders. Cinca and Nieto [2] in their study used multivariate linear regression for developing a profit score model, where profit scoring outperformed credit scoring in identifying profitable loans. However, the drawback of their study is that they exclude ongoing loans while developing their model. This problem is addressed in the study by Byanjankar and Viljanen [11], where they apply survival analysis that incorporates all loans, including the ongoing loans, when creating a profit score model. Their model gives monthly default probability of loans and uses it for calculating expected profit given an interest rate. However, loss given default is an integral part of credit risk analysis along with the probability of default. The recovery rate in P2P lending has been found to be very low [12]. The studies in P2P lending have been predominantly focused on predicting defaults, but the loss in the event of a default has been studied less. Zhou et al. [13] models LGD as simply the ratio of total payment to total loan amount and studies the distribution of LGD. In addition, they study the influencing factors to LGD in P2P lending with a multinomial linear regression. Papouskova and Hajek [14] use random forest to build LGD model for P2P lending in a two-stage process. They show that LGD distribution is skewed to large LGD values, and in the first stage they apply random forest classification to identify loans with an extremely large LGDs. In the second stage, they apply random forest regression on the remaining loans to estimate
18
M. Viljanen et al.
their LGD. Their method appears more effective than traditional simple multinomial linear regression.
3 Model Each peer-to-peer loan is defined by three variables: loan amount M, loan duration n, and the interest rate I. When these variables are known, we can calculate the monthly loan payments given by the annuity formula P = MI(1 + I)n /((1 + I)n − 1). However, borrowers may not pay all the monthly payments as agreed upon. This is known as a loan default, and we define it as the event of falling back the total amount of two consecutive monthly payments. The loan stays on schedule until the default event, and from thereon it goes to a process called recovery, where the lender attempts to recover as much principal of the loan as possible with interest. These recovery payments do not often follow a well-defined schedule, and in fact nothing may be recovered. However, a major problem in real-world data sets is that many of the loans are censored. This means that a loan duration may be 5 years, but the loan was made 2 years ago and the scheduled payments are still ongoing. We do not know if the loan will default or not, and the default event is therefore censored. Likewise, another loan may have defaulted a year ago but we are still extracting recovery payments from it. We do not know how many payments we can recover before the loan stops repaying, and the payments from thereon are censored. This is illustrated in Fig. 1, where the loan observation stops at current time and the future of these loans becomes unknown. We present a novel model for both defaults and recoveries, which are unbiased by data censoring because they model the loan through monthly payment intervals. We therefore model a given loan with two complementary models. The first model predicts the probability of the loan default event, and the second model predicts the time-discounted principal lost in the event of default, known as the LGD model. When we predict the probability of a loan defaulting in monthly intervals and the expected monthly recoveries thereafter, we can calculate the discounted expected
Fig. 1 Illustration of loan histories (left), the default model (middle), and LGD model (right)
Predicting Profitability of Peer-to-Peer Loans with Recovery Models …
19
cashflows from monthly payments. We therefore predict the profit of the loan, or the return on investment (ROI). The default model is based on the number of scheduled monthly payments. We define that a loan that defaults at a monthly interval t if the loan fell back on a total of two monthly payments after it. The edges of monthly intervals are the payments. This is illustrated in the middle of Fig. 1. The outcome of each monthly interval t = 1,2,… is a binary trial Y t of loan default or survival. The default probability P[Y t = 1] is also the default rate μt = E[Y t ] for a given month t. It does not matter that a loan is censored, because the monthly default probabilities are defined in terms of loans that have survived to each month. For example, in the middle figure the third monthly interval has five loans and one of the loans defaults on that interval, making the third month default rate 1/5 = 20%. Given a loan covariate vector x, the monthly default rate μt is modeled using logistic regression with coefficients β, and a separate intercept ηt for each month. This is a discrete-time analog of the Cox proportional hazard model: μt / (1 − μt ) = exp(ηt + β T x)
(1)
The LGD model is based on the recoveries in monthly intervals after a default. The modeling starts when a loan defaults and ends at the current time. The goal is to predict the expected recovery in each subsequent monthly interval as a percentage of the remaining principal, known as Exposure at Default (EAD). This is illustrated in the right of Fig. 1. The total recoveries in the monthly intervals t = 1,2,… are real-valued random variables Rt . The expected recovery γ t = E[Rt ] is the recovery rate for a given month t. Again, it does not matter that a loan is censored, because the recoveries are defined only in terms of loans that are observable at each month. For example, in the right figure the first month recovery interval has five observable loans: two of the loans make some recoveries and three make no recoveries on that interval. The first month recovery rate is their mean value. Given a loan covariate vector x, the monthly recovery rate γ t is modeled using least squares regression with coefficients α and monthly intercepts θ t . We use a logarithmic link function to restrict predictions into only positive recovery values, a novel model that assumes proportional recoveries: γt = ex p(θt + α T x)
(2)
To obtain monthly default and recovery rates in the aggregate loan data set, we can simply calculate the mean value of defaults or recovery amounts in each interval. This corresponds to a model without any covariates. Figure 2 plots the Bondora data set. Over 10% of the loans default in the first interval, but thereafter the default rate begins to decline from 3% to 1% over the span of 60 months. The recoveries are also highest initially, with 1.2% of principal recovered in both the first and second months after default, which then declines linearly from 0.9% to 0.3% over the same time.
20
M. Viljanen et al.
Fig. 2 Actual recovery and default rates calculated from the Bondora data set
Table 1 Loan status, interest rate, and Bondora’s LGD prediction by loan issue year Year Current
2013 (%) 3
2014 (%) 5
2015 (%)
2016 (%)
2017 (%)
2018 (%)
2019 (%)
10
12
24
43
70 22
Late
20
40
49
50
51
38
Repaid
77
55
41
37
25
18
7
Interest
26
29
33
41
47
32
36
LGD
67
76
79
64
59
55
20
4 Data Set Bondora is a major peer-to-peer lending platform which operators in Estonia, Finland, Spain, and Slovakia. They publish a public data set,1 which is updated daily, so that investors can model the loans for portfolio selection. At the time of writing, the data set had 119 341 loans and 112 columns. The data set includes financial and demographic information, which can be used to create features for modeling. It also describes the current state of the loans, their payment behavior, and Bondora’s own predictions. Each loan is either in the state of repaid, current, or late. A loan has defaulted if it is more than 60 days past its due payment. Bondora has their own rating system to model credit risk, which was applied from 2013 onward. Because there are no ratings assigned to loans before 2013, we only model loans after this date. We first present some descriptive statistics of the data set. Table 1 describes the loan status (Current/Late/Repaid), interest rates, and Bondora’s LGD predictions over 10 years. It can be seen that censoring is a major issue, with 70% of loans in 2019 and even 10% of loans issued in 2015 being censored. Defaults are another major issue. Of the 2015 loans, 49% were late and only 41% were repaid. The very high interest rates could compensate for the defaults and losses incurred, 24–47% being the yearly 1 https://www.bondora.com/en/public-reports.
Predicting Profitability of Peer-to-Peer Loans with Recovery Models …
21
Table 2 Loan status, interest rate, and Bondora’s LGD prediction by loan rating Rating
A (%)
AA (%)
B (%)
C (%)
D (%)
E (%)
F (%)
Current
58
62
54
48
46
49
36
HR (%) 7
Late
18
16
22
27
32
34
47
63
Repaid
25
22
24
25
22
17
18
30
Interest
13
11
16
22
29
36
54
77
LGD
40
35
44
45
43
39
44
75
averages. It is quite peculiar that the LGD predictions start to decrease after years 2014 and 2015, being only 20% in the year 2019. Bondora does not publish the exact formula for this prediction, but based on their description of profit calculation, we hypothesize that they assume the future payments to be paid in full and do not deal with censoring properly. In fact, with the mostly fully observed 2014 and 2015 loans, their average LGD of around 75–80% is close to what we obtain from our unbiased calculation with censored data using a 10% discount rate. The Bondora rating and LGD predictions attempt to compensate for the risk. Table 2 describes the same statistics by loan ratings. It can be seen that the assigned rating correlates well with risk, since the defaults go up as we move to risker loans. The interest rates also move up to compensate. However, with the exception of High Risk (HR) loans, the ratings do not correlate with the predicted losses after a default.
5 Experiments 5.1 Interpreting Loan Profitability and Borrower Characteristics In the first experiment, we fit the model to the entire loan data set to interpret the predicted LGDs. An advantage of using a linear model is that the coefficients exp(α) and exp(β) of the models are directly interpretable. This allows one to infer how different demographic and financial information of borrowers predicts their individual default and recovery rates. We omit the extensive coefficient tables for reasons of space, but the results are very intuitive. It seems that the default model and the LGD model often have similar information that predicts the loan to have a smaller default risk and a larger recovery amount: earlier loan issuance year, better country and credit rating, high education, homeownership, stable job, lower debt load, existing customer, etc. Our model predicts the expected monthly recoveries as a percentage of the remaining principal after a loan default. Denote these recovery rates, up to 60 months, as γ 1 ,…, γ 60 . We then compute the loss given default with discounted cashflow (DCF) analysis, where each payment is discounted by the required return on investment.
22
M. Viljanen et al.
Fig. 3 LGDs computed from our model compared to those that Bondora’s reports
Assuming a 10% annual profit requirement, we require approximately 0.8% profit per month, given by the discount rate d = 1.11/12 . The LGD value is then defined as LG D = γ1 / d 1 + · · · + γ60 / d 60 − 1
(3)
We compared the LGD calculated from our model to the Bondora own LGD prediction in data set. Figure 3 visualizes the results for all loans in the data set. It seems that our model produces smoother estimates and is more pessimistic at around 0.75–0.80 average LGD, whereas they suggest a more discretized LGDs with average in the 0.50–0.55 range. Otherwise, there is some correlation between the two models. In the special case that the loan’s default rate is constant, i.e., μt = h, we can compute the profit P with a simple formula, given an interest rate I and LGD value D: P = (1 − h)I + h D
(4)
In the formula, the loan survives with probability 1-h, in which case we profit the interest rate I, and defaults with probability h, in which case we incur a loss of D. The expected profit P is the sum of the expected value (1-h)I in the case of survival and hD in the case of default. We see that a higher default rate and a higher LGD imply a higher expected loss. However, this can be compensated by a higher interest rate. In general, it is not necessary to assume a constant default rate, since we can use the DCF analysis on the cashflows predicted by the two models. We use it here to visualize and understand the LGD model predictions. We refer to this as a simplified model. Figure 4 plots the expected gain from loan survival (x) against the expected loss from default (y). The profit is the sum of these, meaning that lines with P = y + x have a constant profit. We have illustrated the lines with −20%, 0%, and 20% yearly profits. It seems that most real-world loans are priced according to similar models, because offered interest rate has a correlation against the losses implied by the default rate and the LGD. However, the correlation is not perfect, which suggests that if the model is correct, there is a possibility to select more profitable loans.
Predicting Profitability of Peer-to-Peer Loans with Recovery Models …
23
Fig. 4 Expected loss is the default rate * LGD, but it can be compensated by the interest rate
5.2 Predicting Loan Profitability To evaluate the prediction ability of our model, we randomly sampled a test set with 25% of the loans and a training set with the remaining 75%. Because we can model censored loans, these two sets are disjoint and together they contain every single loan. We then trained the model with the loans in the training set and predicted the recoveries in each month for every test set loan, should they default. For those test set loans that defaulted, we then compared the predicted recoveries to the actual recoveries in every monthly interval. The resulting mean and mean squared error are reported in Fig. 5. As claimed previously, the zero mean error implies that the model is unbiased. The mean squared error is relatively small also, though this is to a large degree explained by the small actual recovery values. We cannot directly compare the error in this model to previous models, because our model has the novelty of predicting values in monthly intervals in order to deal with censoring, and other models do not report separate monthly predictions. Fig. 5 LGD model mean error and mean squared error in the test set
24
M. Viljanen et al.
Fig. 6 Actual and predicted LGDs in portfolios of defaulted loans, ordered by predicted losses
However, we devised a direct way of comparing Bondora’s own models as follows. For each test set loan, we get an LGD prediction similar to Bondora’s LGD. We created eight different “portfolios” of defaulted loans, from the largest predicted losses (8) to the lowest predicted losses (1). Each bucket had the same amount of loans where possible. We then calculated the actual LGD of each portfolio by taking the mean recovery in each interval and discounted these to present value with a 10% annual interest rate. The actual loss is the present value of this portfolio minus one, which is an unbiased estimate of the LGD. The results in Fig. 6 are very good for our model, but not so good for Bondora’s model or the approach of sorting by the rating. Our model detects the profitability order correctly and, in fact, predicts the actual LGDs very well. Bondora’s model and the rating approach seem quite random, though, indicating that they cannot distinguish higher losses and lower losses very well. In practice, it would be possible to exploit this model to obtain higher profits in two ways. First, it is possible to predict the profit of loans more accurately, given more accurate predictions for the LGD component. This enables the investor to make more profitable new loans in the primary market. Second, it is possible to buy ongoing or defaulted loans directly from the secondary market, bidding for those loans that have higher implicit profits.
6 Conclusion In this paper, we presented a novel model that is unbiased by censored loans, i.e., those that are still ongoing. In practice, all real-world data sets like Bondora’s have many such loans and it is essential for the investors to be able to estimate the profit for new loans. To deal with censoring, we based our models on the monthly payment intervals and predicted recoveries on each interval. We extended the analysis in our previous paper with a model for recoveries and time-varying default rates. When we
Predicting Profitability of Peer-to-Peer Loans with Recovery Models …
25
combine these, we can compute an unbiased estimate of the expected profit in each loan. The experiments suggest that borrower information is useful and quite intuitive predictor of defaults and the resulting losses. It appears that P2P loans are priced according to similar ideas, but perhaps not perfectly efficient. We also demonstrated that our model outperforms the platform’s own LGD model in separating the loans into either higher or lower losses. In the future, this analysis can be applied to other lending data sets. It would be interesting to compare the predictions to the profits investors are promised, since simple calculations tend ignore the censoring problem.
References 1. Wang, Z., Jiang, C., Ding, Y., Lyu, X. And Liu, Y.: A novel behavioral scoring model for estimating probability of default over time in peer-to-peer lending. Electron. Commer. Res. App. 27, 74–82 (2018) 2. Klafft, M.: Online peer-to-peer lending: a lenders’ perspective. In: Proceedings of the International Conference on ELearning, E-Business, Enterprise Information Systems, and EGovernment, pp. 371–375. EEE (2008) 3. Serrano-Cinca, C., Gutierrez-Nieto, B., López-Palacios, L.: Determinants of default in P2P lending. PLoS One 10(10), e0139427 (2015) 4. Emekter, R., Tu, Y., Jirasakuldech, B., Lu, M.: Evaluating credit risk and loan performance in online Peer-to-Peer (P2P) lending. Appl. Econ. 47(1), 54–70 (2015) 5. Lin, X., Li, X., Zheng, Z.: Evaluating borrower’s default risk in peer-to-peer lending: evidence from a lending platform in China. Appl. Econ. 49(35), 3538–3545 (2017) 6. Malekipirbazari, M., Aksakalli, V.: Risk assessment in social lending via random forests. Expert Syst. Appl. 42(10), 4621–4631 (2015) 7. Byanjankar, A., Heikkilä, M., Mezei, J.: Predicting credit risk levels in peer-to-peer lending: a neural network approach. In: IEEE Symposium Series on Computational Intelligence, SSCI, pp. 719–725. Cape Town (2015) 8. Xia, Y., Liu, C., Liu, N.: Cost-sensitive boosted tree for loan evaluation in peer-to-peer lending. Electron. Commer. Res. Appl. 24, 30–49 (2017) 9. Ðurovi´c, A.: Estimating probability of default on peer to peer market-survival analysis approach. J. Central Banking Theor. Pract. 6(2), 149–167 (2017) 10. Serrano-Cinca, C., Gutiérrez-Nieto, B.: The use of profit scoring as an alternative to credit scoring systems in peer-to-peer (P2P) lending. Decis. Support Syst. 89, 113122 (2016) 11. Byanjankar, A., Viljanen, M.: Predicting expected profit in ongoing peer-to-peer loans with survival analysis-based profit scoring. In: Intelligent Decision Technologies. Smart Innovation, Systems and Technologies, vol. 142. Springer, Singapore (2019) 12. Mild, A., Waitz, M., Wöckl, J.: How long can you go?—Overcoming the inability of lenders to set proper interest rates on unsecured peer-to-peer lending markets. J. Bus. Res. 68(6), 1291–1305 (2015) 13. Zhou, G., Zhang, Y., Luo, S.: P2P network lending, loss given default and credit risks. Sustainability 10(4) (2018) 14. Papoušková, M., Hajek, P.: Modelling loss given default in peer-to-peer lending using random forests.InL Intelligent Decision Technologies. Smart Innovation, Systems and Technologies, vol. 142. Springer, Singapore (2019)
Weighted Network Analysis for Computer-Aided Drug Discovery Mariko I. Ito and Takaaki Ohnishi
Abstract Biologically relevant chemical space is a set of bioactive compounds that can be candidates for drugs. In data-driven drug discovery, databases of bioactive compounds are explored. However, the biologically relevant chemical space is quite huge. Understanding the relationship between the structural similarity and the bioactivity closeness of compounds helps the efficient exploration of drug candidates. In these circumstances, network representations of the space of bioactive compounds have been suggested extensively. We define the weighted network where each node represents a bioactive compound, and the weight of each link equals the structural similarity between the compounds (nodes). We investigated the weighted network structure and how the bioactivity of compounds distributes on the network. We found that compounds with significantly high or low bioactivity have a stronger connection than those in the overall network.
1 Introduction The chemical space is an abstract concept but is roughly defined as a set of all possible molecules [15, 19]. In cheminformatics, the central idea that structurally similar compounds tend to share the similar chemical properties is called the similarity property principle [18]. Based on this idea, the calculation of structural similarity is performed for various purposes such as drug discovery to retrosynthetic analysis [8, 9]. In drug discovery, biologically relevant chemical spaces are primarily explored. Compounds exhibit biological activity in this space. For example, ligand is a compound that binds to a receptor (the target) and inhibits biological response. In these circumstances, it has been extensively investigated whether compounds with a similar structure share similar bioactivity [15].
M. I. Ito (B) · T. Ohnishi The University of Tokyo, Bunkyo-ku Tokyo, Japan e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_3
27
28
M. I. Ito and T. Ohnishi
In networks, edges represent various kinds of relationships such as interaction, social influence, and correlation between two nodes [1, 3, 7, 13, 14]. Investigating the topology of such networks affords a global view of how they are related to each other. For example, through community detection on networks, we can extract the groups of nodes each of which is densely connected. Nodes in a community can be regarded as those that are particularly interacting [12], and having a similar feature or role [6]. “Being similar” is a kind of relationship. Previous studies have suggested certain network representations of biologically relevant chemical spaces, where each node represents a compound and each link shows the similarity relationship between the compounds [8, 9, 18–22]. They investigated the topological features of the chemical subspace and examined how molecules with certain bioactivity are distributed among the network. In these studies, community detection was performed as well. Through community detection on such networks, we obtain groups containing nodes with a similar chemical structure. Network representation of a chemical space was performed using the circular fingerprint technique and Tanimoto coefficient [22]. In circular fingerprint representation, a molecule is often represented by a vector called a fingerprint. In the vector, each index denotes a certain chemical substructure, and the entry denotes the count of the molecule substructure corresponding to the index. The Tanimoto coefficient is the most popular similarity measure between two molecules [10]. It takes a value from 0 to 1, and equals 1 if two molecules are the same. In previous studies regarding network representation based on the Tanimoto coefficient, two nodes were assumed to be connected by a link if the Tanimoto coefficient between them exceeded a preset threshold. In these studies, a threshold-dependent unweighted network is defined, known as the “threshold-network”. Furthermore, the value of the preset threshold was tuned such that the edge density was approximately 0.025. Consequently, a well-resolved community structure was obtained. Although the evaluation was not performed in detail, the visualized network demonstrated that compounds with similar bioactivity tend to form a community [22]. However, the topology of the threshold network is affected significantly by the preset threshold. A point of concern is that the threshold network constructed by an artificially preset threshold cannot capture the structure of the chemical space. While constructing the threshold network, the structural information of the chemical subspace should be reduced significantly. Hence, in the present study, we analyze the weighted network of biologically relevant chemical spaces as follows. Instead of applying a preset threshold to determine the existence of a link, we assume that two nodes (molecules) are connected by a link whose weight is the similarity between them. In particular, we are interested in discovering whether the weighted network topology can facilitate the investigation of compounds with high bioactivity. We evaluate the community structure on the weighted networks and discuss whether nodes that are strongly connected to each other share a similar activity.
Weighted Network Analysis for Computer-Aided Drug Discovery
29
2 Materials and Methods To investigate the structure of biologically relevant chemical spaces, we collected data from ChEMBL (version 25), an open bioactivity database [4]. We selected 19 targets based on a previous study [22], as shown in Table 1. For each target, we extracted the data of compounds whose potency has been tested against the target by the measure of Ki, a kind of bioactivity. We regard the pChEMBL value of the compound as an indicator of its bioactivity [4, 16]. The larger the pChEMBL value, the stronger is the bioactivity against the target [16]. The number of compounds corresponding to each target is shown in Table 1. Subsequently, we obtained the “Morgan fingerprints” (circular fingerprints) of these compounds using RDKit, an open-source toolkit for cheminformatics. The circular fingerprint we used in this study is Morgan fingerprint. Each fingerprint is a 2048-dimensional vector, in which the entry in each index is an integer. We calculated the Tanimoto coefficient for all pairs of compounds that correspond to each target. For two molecules that have fingerprints xi and x j , the Tanimoto coefficient Ti j [10] is calculated as Ti j =
|xi
|2
xi · x j . + |x j |2 − xi · x j
(1)
Subsequently, the similarity matrix T , in which the (i, j) entry is the similarity (Tanimoto coefficient) between molecules i and j, is constructed for each target. We considered the weighted network for each target by regarding the similarity matrix as the adjacency matrix. In weighted network analysis, not only the number of links connected to the node, but also the sum of the weight of those links should be considered [2]. The former is the degree of the node, and the latter is its strength [11]. As the Tanimoto coefficient does not vanish in the case of almost all pairs of compounds, the weighted networks are almost complete and the degrees of nodes do not vary. Therefore, we examined how strength is distributed in each weighted network. To examine whether the weighted networks exhibit community structure, we applied Louvain heuristics, an algorithm used to obtain a graph partition that (locally) optimizes the modularity Q [5]. In the case of weighted networks, the modularity Q [6, 11] is defined as Q=
1 Mi j δ(c(i), c( j)), 2m i, j
where Mi j := Ti j −
si s j . 2m
(2)
(3)
The strength of node i, j Ti j , is denoted by si . The sum of all weights i, j Ti j is 2m and c(i) denotes the community to which node i belongs. Kronecker’s delta is denoted by δ; therefore, δ(x, y) equals 1 (0) when x = y (x = y). Regarding Mi j , the second term on the right side of Eq. (3) represents the expected strength of the
30
M. I. Ito and T. Ohnishi
link between nodes i and j in the null network, which is random except that it has the same strength distribution as the focal network [6]. Therefore, Mi j represents how strongly nodes i and j are connected compared to the null model.
3 Results First, we show the histogram of pChEMBL value in each compound set corresponding to each target. For the three examples of networks for targets 238, 2001, and 2014, the histograms of pChEMBL values are shown in Fig. 1a. Few compounds have an extremely small or large value. We also observed a similar tendency in the case of other targets. Subsequently, we examined the strength of the nodes in the weighted networks. In Table 1, we show the mean and standard deviation of the node strength in the weighted network for each target. Figure 1b shows three histograms of the strength normalized by the total strength, si / i, j Ti j , in each network. As shown, many nodes share a similar strength, while a few nodes exhibit small strength. No node exhibited extremely large strength in all networks. Finally, we investigated whether nodes connected with high similarity tend to share similar bioactivity. As explained in Sect. 2, we performed community detection. The number of communities and the modularity Q of the graph partition resulted from the Louvain heuristics is shown for each target in Table 1. The values of modularity are low, and the community structure in each network is weak in general. We further inspected the community structure obtained by this community detection. Some detected communities were not connected sufficiently; as such, they could not be called “communities”. Therefore, we extracted communities that could
Fig. 1 Histogram of pChEMBL values and strength. a Histograms of pChEMBL values in networks of targets 238, 2001, and 2014. b Histograms of strength normalized by total strength in networks of targets 238, 2001, and 2014. Markers are same as in (a)
Weighted Network Analysis for Computer-Aided Drug Discovery
31
Table 1 Examined compounds. In the first column, Target ID means the ChEMBL ID assigned to each target in the ChEMBL database. The second column shows the name of the targets—Hs, Cp, and Rn represent Homo sapiens, Cavia Porcellus, and Rattus norvegicus, respectively. In the third column, the number of compounds that correspond to the target is shown. In the fourth and fifth columns, the mean and standard deviation of the node strength are shown, respectively. The sixth and seventh columns show the number of communities and the modularity Q in the resulted graph partition by the community detection, respectively Target ID Target name Size Mean Std Com Q 255 3242 269 219 238 65338 339 4124 344 270 4354 2014 2001 225 273 322 1833 3155 4153
Adenosine A2b receptor (Hs) Carbonic anhydrase XII (Hs) Delta opioid receptor (Rn) Dopamine D4 receptor (Hs) Dopamine transporter (Hs) Dopamine transporter (Rn) Dopamine D2 receptor (Rn) Histamine H3 receptor (Rn) Melanin-concentrating hormone receptor 1 (Hs) Mu opioid receptor (Rn) Mu opioid receptor (Cp) Nociceptin receptor (Hs) Purinergic receptor P2Y12 (Hs) Serotonin 2c (5-HT2c) receptor (Hs) Serotonin 1a (5-HT1a) receptor (Rn) Serotonin 2a (5-HT2a) receptor (Rn) Serotonin 2b (5-HT2b) receptor (Hs) Serotonin 7 (5-HT7) receptor (Hs) Sigma-1 receptor (Cp)
1575
723
130
3
0.060
2392
863
262
3
0.051
1577
719
113
4
0.081
2138
1087
183
4
0.031
1406
528
86
4
0.065
1624
723
105
3
0.074
2555
1119
171
3
0.045
1591
637
135
4
0.075
1430
771
68
4
0.046
2318 654 1105 584
984 266 519 400
178 51 97 74
4 3 4 4
0.091 0.078 0.070 0.029
1980
785
132
4
0.049
3370
1469
249
3
0.052
3076
1278
215
4
0.067
1121
421
74
5
0.058
1569
775
129
3
0.035
1617
717
123
3
0.048
32
M. I. Ito and T. Ohnishi
Fig. 2 Histograms of Mi j and pChEMBL values in communities. Histograms of Mi j in the overall network and in each community in the cases of targets 238 (a), 2001 (b), and 2014 (c). Histograms of pChEMBL values in the overall network and in each community in the cases of targets 238 (d), 2001 (e), and 2014 (f). In these figures, only communities with Q C > 0.2 and size exceeding 20 are shown
be regarded as connected strongly. For detected community C, we defined the extent to which the nodes are strongly connected within C, Q C , as QC =
1
i, j∈C
Ti j
Mi j ,
(4)
i, j∈C
where i, j∈C denotes the sum of all pairs of nodes within community C. Therefore, Q C can measure how strongly links within C are connected without considering other communities. In Fig. 2a–c, we show the histograms of Mi j for all pairs of nodes within each community and those in the overall network, for three targets. In these figures, only communities with Q C exceeding 0.2 are shown. Therefore, these communities have larger values of Mi j than the overall network. On the other hand, Fig. 2d–f shows the histogram of pChEMBL values of nodes in each community included in Fig. 2a–c and the overall network. In the case of target 238, the histogram of pChEMBL value for community 4 shifts to the right side compared to the overall network (Fig. 2d). Community 5 in the network for target 2014 exhibits the same feature as well (Fig. 2f). Conversely, Community 1 in the network for target 2001 comprises nodes with lower pChEMBL values than those in
Weighted Network Analysis for Computer-Aided Drug Discovery
33
the overall network. In Fig. 3, for all targets, we show the mean of pChEMBL values in each community that satisfies Q C > 0.2 and consists of more than 20 nodes. Each error bar shows the standard deviation. In some communities, the mean pChEMBL value is located far from that of the overall network. However, in most cases, this value is within the standard deviation range of that of the overall network.
Fig. 3 Mean pChEMBL value in each community. For each target (ordinate), the mean pChEMBL value in the overall network is shown by a circle. For communities with Q C > 0.2 and size exceeding 20, the mean pChEMBL value in each community is shown by a triangle above that of the overall network. The error bar represents the standard deviation of pChEMBL values
34
M. I. Ito and T. Ohnishi
Fig. 4 Mean Mi j as a function of the ratio R. Mean Mi j of all links that connect nodes with high (more than (100 − R)-th percentile)/intermediate (from (50 − R/2)-th to (50 + R/2)-th percentile)/low (less than R-th percentile) pChEMBL values versus the ratio R, in the cases of targets 238 (a), 2001 (b), and 2014 (c)
In summary, although the whole community structure is weak, we observed some communities in which the nodes are connected with a large weight. In some of them, the distribution of pChEMBL value is biased compared to that of the overall network. This suggests that certain sets of compounds are similar to each other and share stronger/weaker bioactivity against the target than the compounds in general. Subsequently, we investigated whether nodes with particularly high (low) pChEMBL values are connected to each other with a large weight. First, we collected the 0.01R N nodes whose pChEMBL values exceeded the (100 − R)-th percentile, where N is the number of nodes in the network. Second, we calculated the mean of Mi j for all pairs of the 0.01R N nodes. Similarly, we calculated the mean of Mi j for nodes with low pChEMBL values (lower than the R-th percentile), and intermediate pChEMBL values (ranging from the (50 − R/2)-th to (50 + R/2)th percentile). The results for these three cases are presented in Fig. 4a–c, where the horizontal axis represents the ratio R and the vertical axis the mean of Mi j . The mean Mi j exceeded 0 when the ratio R is small in the cases of high and low pChEMBL values. The mean Mi j also exceeds 0 for a small ratio R in the case of intermediate pChEMBL values, but it is much lower than the means in the other cases. The mean Mi j decreases with the ratio and approaches the mean of the overall network, which approximately equals 0. Although Fig. 4a–c shows only the targets 238, 2001, and 2014, we observed the same tendency in all other targets. Therefore, the sets of nodes with high/low pChEMBL values in particular are connected with stronger weights than the overall network. Figure 4a–c shows some consistency with Fig. 2d–f. For target 2001, the nodes with low pChEMBL values are connected strongly (Fig. 4b), and some of them are detected as those included in Community 1 sharing a low pChEMBL value (Fig. 2e). Additionally, consistency is shown between Community 5 in the network of target 2014 (Fig. 2f) and the set of high pChEMBL values (Fig. 4c).
Weighted Network Analysis for Computer-Aided Drug Discovery
35
4 Discussion In this study, we investigated the structure of biologically relevant compounds that share the same target. Previous studies have suggested the network representations of the structure. In the network representation of these compounds, a node represents a compound, and a link between two nodes is drawn if the similarity between them exceeds a preset threshold. The topology of these threshold networks greatly depends on the preset threshold. Therefore, to understand the true nature of the structure, we considered the weighted network, where the weight of each link is the similarity between connecting compounds. For each target, the corresponding weighted network showed a homogeneous structure, which comprises a rare node exhibiting extremely strong bioactivity, or that connecting to other nodes with extreme strength. This homogeneity is attributable to the sample bias—the compounds in each network are those sharing the same target. In drug discovery, the question of whether compounds that are structurally similar share similar chemical properties needs to be elucidated—pairs of structurally similar compounds sometimes have different bioactivities, called activity cliff, and activity cliff can disturb the effective drug discovery [15, 19]. We performed community detection on the weighted networks to investigate whether strongly connected nodes exhibit similar bioactivity. We found that, in general, the community structure was weak in all the weighted networks. However, we observed that nodes with high/low bioactivity against the target were connected strongly to each other compared to the nodes of the overall network. Some detected communities reflected this tendency and each of their nodes exhibited a different range of pChEMBL values compared to those in the overall network. As a practical application, such communities can help us predict whether a novel compound exhibits high bioactivity. If the novel compound is structurally similar to compounds in a community sharing high bioactivity, we can expect it to exhibit high bioactivity. Such a prediction is useful in drug discovery. In a previous study concerning the threshold network, community detection was performed using modularity as the quality function of graph partition [22]. The values of modularity were much greater than those in our study, and well-resolved community structures were obtained. In fact, the threshold was set to obtain high modularity without resulting in an extremely sparse community structure. The threshold was set based on network visualization, which appears to be intuitive. The significant link extraction can be performed by other established methods such as constructing the minimum spanning tree (MST) or the planar maximally filtered graph (PMFG) in future [17]. We hope that the result of our basic weighted network analysis can be used for the comparison of the efficacy of network representations by such link extraction methods. Acknowledgements This work was supported by the JSPS Grant-in-Aid for Scientific Research on Innovative Areas: JP17H06468.
36
M. I. Ito and T. Ohnishi
References 1. Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. Rev. Modern Phys. 74(1), 47 (2002) 2. Barrat, A., Barthelemy, M., Pastor-Satorras, R., Vespignani, A.: The architecture of complex weighted networks. Proc. Natl Acad. Sci. 101(11), 3747–3752 (2004) 3. Bazzi, M., Porter, M.A., Williams, S., McDonald, M., Fenn, D.J., Howison, S.D.: Community detection in temporal multilayer networks, with an application to correlation networks. Multiscale Model. Simul. 14(1), 1–41 (2016) 4. Bento, A.P., Gaulton, A., Hersey, A., Bellis, L.J., Chambers, J., Davies, M., Krüger, F.A., Light, Y., Mak, L., McGlinchey, S., et al.: The chembl bioactivity database: an update. Nucleic Acids Res. 42(D1), D1083–D1090 (2014) 5. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Exp. 2008(10), P10008 (2008) 6. Fortunato, S.: Community detection in graphs. Phys. Reports 486(3–5), 75–174 (2010) 7. Ito, M.I., Ohtsuki, H., Sasaki, A.: Emergence of opinion leaders in reference networks. PloS One 13(3), e0193983 (2018) 8. Kunimoto, R., Bajorath, J.: Combining similarity searching and network analysis for the identification of active compounds. ACS Omega 3(4), 3768–3777 (2018) 9. Kunimoto, R., Vogt, M., Bajorath, J.: Tracing compound pathways using chemical space networks. Med. Chem. Comm. 8(2), 376–384 (2017) 10. Lipkus, A.H.: A proof of the triangle inequality for the tanimoto distance. J. Math. Chem. 26(1–3), 263–265 (1999) 11. Lü, L., Chen, D., Ren, X.L., Zhang, Q.M., Zhang, Y.C., Zhou, T.: Vital nodes identification in complex networks. Phys. Reports 650, 1–63 (2016) 12. Mizokami, C., Ohnishi, T.: Revealing persistent structure of international trade by nonnegative matrix factorization. In: International Conference on Complex Networks and their Applications, pp. 1088–1099. Springer (2017) 13. Newman, M.E.: The structure and function of complex networks. SIAM Rev. 45(2), 167–256 (2003) 14. Ohnishi, T., Takayasu, H., Takayasu, M.: Network motifs in an inter-firm network. J. Econ. Int. Coord. 5(2), 171–180 (2010) 15. Opassi, G., Gesù, A., Massarotti, A.: The hitchhiker’s guide to the chemical-biological galaxy. Drug Discov. Today 23(3), 565–574 (2018) 16. Steinmetz, F.P., Mellor, C.L., Meinl, T., Cronin, M.T.: Screening chemicals for receptormediated toxicological and pharmacological endpoints: Using public data to build screening tools within a knime workflow. Molecular Inf. 34(2–3), 171–178 (2015) 17. Tumminello, M., Aste, T., Di Matteo, T., Mantegna, R.N.: A tool for filtering information in complex systems. Proc. Natl Acad. Sci. 102(30), 10421–10426 (2005) 18. Vogt, M.: Progress with modeling activity landscapes in drug discovery. Expert Opinion Drug Discov. 13(7), 605–615 (2018) 19. Vogt, M., Stumpfe, D., Maggiora, G.M., Bajorath, J.: Lessons learned from the design of chemical space networks and opportunities for new applications. J. Comput. Aided Molecul. Design 30(3), 191–208 (2016) 20. Wu, M., Vogt, M., Maggiora, G.M., Bajorath, J.: Design of chemical space networks on the basis of tversky similarity. J. Comput. Aided Molecul. Design 30(1), 1–12 (2016) 21. Zhang, B., Vogt, M., Maggiora, G.M., Bajorath, J.: Design of chemical space networks using a tanimoto similarity variant based upon maximum common substructures. J. Comput. Aided Molecul. Design 29(10), 937–950 (2015) 22. Zwierzyna, M., Vogt, M., Maggiora, G.M., Bajorath, J.: Design and characterization of chemical space networks for different compound data sets. J. Comput. Aided Molecul. Design 29(2), 113–125 (2015)
Manufacturing as a Service in Industry 4.0: A Multi-Objective Optimization Approach Gabriel H. A. Medeiros, Qiushi Cao, Cecilia Zanni-Merk, and Ahmed Samet
Abstract The unexpected failure of machines or tools has a direct impact on production availability. This gives rise to risks in terms of product quality, profitability, and competitiveness. In order to improve the availability of the companies’ own production facilities without having to rely on cost-intensive reserve machines or other means of minimizing downtime, it is also necessary, beyond planning the production smartly, to be able to outsource the production if required (for example, when a stoppage is inevitable). For this purpose, an intelligent machine broker needs to be implemented, which will coordinate the needs of a network of companies working together and the available machinery at a given time. This paper proposes to use a multi-objective optimization approach to manufacturing as a service, to be able to propose the group of the most convenient available machines in the network to the user companies that are confronted to unforeseen stoppages in their production, allowing them, therefore, to outsource that part of the production to other company in the same network.
G. H. A. Medeiros (B) Universidade Federal do Ceará, Fortaleza 60020–181, Brazil e-mail: [email protected] G. H. A. Medeiros · Q. Cao · C. Zanni-Merk Normandie Université, INSA Rouen, LITIS, 76000 Rouen, France e-mail: [email protected] C. Zanni-Merk e-mail: [email protected] A. Samet ICUBE/SDC Team (UMR CNRS 7357)-Pole API BP 10413, 67412 Illkirch, France e-mail: [email protected]
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_4
37
38
G. H. A. Medeiros et al.
1 Introduction The unexpected failure of machines or tools has a direct impact on production availability (+6% increase in costs in 2015 according to the Association of German Machine Tool Builders (VDW)). This gives rise to risks in terms of product quality, profitability, and competitiveness. The smart planning of preventive maintenance is therefore an essential prerequisite for positively influencing quality and production. The HALFBACK project [8] extracts knowledge from sensor data of production lines to be able to predict failures and to implement optimized maintenance plans. In order to improve the availability of the companies’ own production facilities without having to rely on cost-intensive reserve machines or other means of minimizing downtime, it is also necessary, beyond planning the production smartly, to be able to outsource the production if required (for example, when a stoppage is inevitable). For this purpose, an intelligent machine broker needs to be implemented, which will coordinate the needs of companies working together and the available machinery. Indeed, our proposal is to have Manufacturing as a Service. In our proposal, manufacturing SMEs working together can advertise their machines when they are not producing, for other SMEs needing to outsource their productions to choose them for rental over a certain period of time, according to different criteria (technical characteristics, cost, availability, distance ...). In this way, the HALFBACK project aims at improving cross-border (between France and Germany) production process among SMEs working as a network. In this article, we present the design of a High Availability Machine Broker for a network of SMEs, and in particular, the choice and implementation of the multiobjective optimization algorithm for its final development. The paper is organized in the following way: Sect. 2 presents the context of this project and the positioning of the broker in the whole value chain of HALFBACK, Sect. 3 describes the methodology followed for this study, Sect. 4 describes the experiments done on a well-known benchmark in the literature, while Sect. 5 shows the results in a use case. Finally, Sect. 6 presents our conclusions and some ideas for future work.
2 Context As indicated in the Introduction, the HALFBACK project aims at improving the competitiveness of manufacturing SMEs along the river Rhine by their networking with this innovative approach of Manufacturing as a Service. Within the project, virtual profiles of the machines (footprints) are aggregated in the Cloud and registered at a High Availability Machine Broker. Registering the machine footprint, the machine location, and the machine availability, among other useful data to the broker, allows it to offer the machine as a service to other companies. In case of unavoidable machine failure, the HALFBACK software can then use this
Manufacturing as a Service in Industry 4.0: A Multi-Objective …
39
broker to search for an adequate machine replacement to shift production to another factory. Concerning these optimization operations, we prone the use of semantic evolutionary algorithms. Evolutionary Algorithms (EA) have proven to be very effective in optimizing intractable problems in many areas [7, 11]. There are several other types of efficient optimization algorithms outside our scope and quite different areas of application as in [1, 9]. Therefore, our choice of exploration are the algorithms in Sect. 4. However, real problems including specific constraints (legal restrictions, specific usages, etc.) are often overlooked by the proposed generic models. To ensure that these constraints are effectively taken into account, we have developed a knowledge engineering methodology based on the structuring of the conceptual model underlying the problem, creating a domain ontology suitable for EA-optimization of problems involving a large amount of structured data, as it is the case of the HALFBACK project [3].
3 Methodology The methodology we followed to choose the best approach for optimization in the machine broker is twofold: 1. Retrieve a set of proven robust optimization algorithms from the literature and a benchmark to analyze and select the most efficient ones; 2. Apply the selected methods to an artificial machine database to see if they are consistent to implement in the broker. We present here some factors needed for the coherence of the overall process.
3.1 Data Representation The fingerprint of a machine in the broker includes a group of parameters that are defined by default, such as the company’s GPS coordinates, the machine type, the availability dates, the company reputation, the cost of the service, and machine purchase date, among others. Thus, these machines can be seen as points in an Ndimensional space positioned by their values of each parameter. However, as many different algorithms will be tested, a standard unique data representation form should be defined. For this purpose, the binary representation was chosen because of its effective performance [15, 18]. To obtain this representation, the following group of operations must be performed:
40
G. H. A. Medeiros et al.
Table 1 Machines in ascending order Binary label Integer label Cost ($) 000 001 010 011 100
0 1 2 3 4
1000 2000 3000 4000 5000
Distance (km) Availability (days)
...
25 50 75 100 125
... ... ... ... ...
1 2 5 9 14
1. First, consider the machines as points in a Euclidean plane; 2. Data is sorted in ascending order according to the distance between each point and the origin; 3. For each position, an integer label representing the position of the machine in the ascending order is assigned, as shown in the second column of Table 1; 4. Finally, each integer label is converted to a binary label, as shown in the first column of Table 1. In this way, each algorithm worked with binary values indicating the real points.
3.2 Fitness Function and the Pareto Front Multi-objective optimization algorithms focus on finding the maximum or minimum possible values for different objectives represented by functions, all grouped in what is called a Fitness Function. F(x) = [ f 1 (x), f 2 (x), ..., f n (x)], x ∈ S,
(1)
where n > 1 is the number of objectives and S is the set of candidate solutions to the problem, also called feasible set. The feasible set is typically defined by some constraint functions. The space to which the objective vector belongs is called the objective space, and the image of the feasible set under F is called the attained set. Such a set will be denoted in the following with C = {y ∈ R n : y = F(x), x ∈ S} [21]. The scalar concept of “optimality” does not apply directly in the multi-objective framework. Here the notion of Pareto optimality [2] has to be introduced. Essentially, a vector x∗ ∈ S is said to be Pareto optimal for a multi-objective problem if all other vectors x ∈ S have a higher weighted value for, at least, one of the objective functions wi f i , with i = 1, ..., n and have the same or higher weighted value for all the objective functions. A point x∗ is said to be an Pareto optimum or a efficient solution for the multi-objective problem if and only if there is no x ∈ S such that wi f i (x) < wi f i (x∗) for all i ∈ 1, ..., n [21]. The image of the efficient set, i.e., the
Manufacturing as a Service in Industry 4.0: A Multi-Objective …
41
image of all the efficient solutions, is called Pareto front or Pareto curve or surface. The shape of the Pareto surface indicates the nature of the trade-off between the different objective functions. As stated in the previous sections, our problem is not a single-objective optimization problem, but a multi-objective one, because an optimal selection of the machines needs to be made in the presence of trade-offs between two or more, eventually conflicting, objectives. Multi-objective approaches appear clearly as a possibility to solve our problem after a careful analysis of the constraints coming from the underlying data model. Moreover, beyond the classical advantages of these approaches, they are efficient in limiting a concentrated convergence of the solutions in a small subset of the Pareto front, which is very interesting.
3.3 A Word About Metrics It is easy to analyze the Pareto front when there are only two or three criteria to evaluate, but when there are more than three dimensions, it is needed to have some metrics to evaluate if the optimization algorithm finds the real Pareto front. Two different types of metrics were used in these works: metrics that analyze the Pareto front found by the algorithm with the real Pareto front (in the case of specific benchmarks being used), or metrics that compare different found Pareto fronts among each other [21]. 1. Comparison within the optimal front: i. Convergence: the smaller the value of this indicator is, the closer the front is to the optimal one ; ii. Diversity: The smaller the value of this indicator is, the better the found front represents the optimal one; 2. Comparison among fronts: i. Hyper-volume: In a case of multi-maximization/multi-minimization, the higher/lower the value, the better the front compared to the others.
3.4 Benchmark Data For our first analysis of possible optimization algorithms that could be used for the broker, we decided to work on some benchmarks published in the literature. One of the most well-known is ZDT1 [19]. Z DT 1(x) =
f Z DT 11 (x) = x1 f Z DT 12 (x) = g(x)[1 −
x1 ], g(x)
(2)
42
G. H. A. Medeiros et al.
where x is a n-tuple,Z DT 1(x) is a 2-tuple and 9 xi . n − 1 i=2 n
g(x) = 1 +
(3)
The Pareto front of this function is calculated in [16], but the input belongs to a continuous space. To adapt ZDT1 to our case (inputs belonging to a discrete space), NSGAII [5] was used to obtain the discrete input space points that are associated to points in Pareto front in the output space by the ZDT1 function. The points calculated in this way already have convergence and diversity values other than zero, and are almost optimal. Therefore, an interesting thing was to see if the tested algorithms were able to find the same points or even eliminate the ones that make the Pareto front worse in relation to diversity. Randomly generated points (called fake points in this paper) have been added to the input space to increase the difficulty and to ensure that the algorithms are robust enough to continue searching for the right points in the Pareto front. Next section presents our results on a well-known benchmark applying the methodology just introduced.
4 Experimentation on the ZDT1 Benchmark Different literature optimization algorithms were chosen for testing with two ZDT1 benchmarks, one with 100 points on the Pareto front and 100 fake points and the other with 100 points on the Pareto front and 10000 fake points. This choice was made based on the requirements of the HALFBACK project, stating the experimentation of nature-inspired optimization algorithms for manufacturing as a service. These algorithms are (1) Non-dominated Sorting Genetic Algorithm II (NSGAII) [17], (2) Improving the Strength Pareto Evolutionary Algorithm (SPEA2) [20], (3) Non-dominated Sorting Gravitational Search Algorithm (NSGSA) [12], (4) Pareto Simulated Annealing (PSA) [4], (5) Binary Multi-Objective Tabu Search [14], and (6) Simple Cull algorithm [6]. As explained out in the methodology section, changing the data representation to binary labels, the algorithms were executed on the ZDT1 database to find the possible source of Pareto front and in the end, the metrics were evaluated. Each method was executed 30 times on both benchmarks, followed by a box plot of the results. We observed that the results of both benchmarks were almost the same, but the algorithms with the biggest variation in the smaller dataset had a bigger variation in the larger dataset (also in execution time). As we can see in Fig. 1, the first two methods (NSGAII and SPEA2) had the best results. The comparison between the results enables the selection of the best ones which will be applied in the next section. Two other methods were tested after the first experimentation: (1) Non-dominated Sorting Genetic Algorithm III (NSGAIII) [13] and (2) Full Non-dominated Sorting and Ranking [17].
Manufacturing as a Service in Industry 4.0: A Multi-Objective …
43
Fig. 1 Box plot on ZDT1 with 10000 fake points regarding convergence, diversity, hyper-volume, and execution time
The first one is an evolution of NSGAII, which guarantees better results in larger dimensions (which is our case in practice, but not our case with ZDT1 which has only two dimensions). As expected, the NSGAIII results were close to the ones of NSGAII and SPEA2 on ZDT1 data. Full non-dominated sorting and ranking is a deterministic algorithm (such as simple cull), but it finds not only the primary Pareto front, but other different fronts (secondary, tertiary, and so on). Unfortunately, its complexity is O(n 2 m), where n is the size of the dataset and m the number of attributes to be considered, which is very costly in terms of execution time. The last two methods got results as good as the other two previously chosen, so both were also chosen for the next phase: a case study in Sect. 5.
5 Experimentation on a Case Study An application on several artificially generated machine data has been done. The machine database is composed of 5000 different footprints. For each machine, the following attributes were considered for optimization: availability, supplier company reputation (evaluated from 1 to 5 stars), cost of renting, age of the machine, and distance from the user company to the supplier company. The type of the needed machine (such as milling machine or cooling machine) as well as the lower and upper limits of the values of the attributed is used to filter the database before performing the optimization algorithms, to simulate what the client companies do in their search. In our case, we implemented the following algorithms on a filtered database composed of 817 machines:
44
G. H. A. Medeiros et al.
1. Stochastic methods: Non-dominated sorting genetic algorithm II/III and improving the strength Pareto evolutionary algorithm; 2. Deterministic methods: Simple cull and full non-dominated sorting and ranking.
5.1 Temporal Overlap Coefficient A very important piece of information to be processed by the broker is the availability (the dates when the user company needs the machine will not necessarily be the dates on which the machine will be available). It is needed to define an indicator to evaluate this availability, based on some kind of temporal logic. Allen’s Logic [10] describes a temporal representation that takes the notion of time intervals as a primitive and gives a method of representing the relationships between two of them. This logic presents 13 ways in which an ordered pair of time intervals (A, B) can be related (Fig. 2). The Temporal Overlap Coefficient (TOC) is defined as min(A f −Ai,B f −Bi) , if A overlaps or meets B (A f −Ai) (4) TOC = −1 , otherwise. 1 ( ) e
|Bi−A f |
Analysis of the values of the TOC yields i. T OC = 1, the machine is available as long as the user company needs it; ii. 0 < T OC < 1, the machine is only available part of the time; iii. The lower the value of T OC is, the further away the machine’s availability date is from the date the user company needs it.
Fig. 2 The 13 possible relationships between time intervals A and B in Allen’s logic
Manufacturing as a Service in Industry 4.0: A Multi-Objective …
45
Fig. 3 Scatter plot of the results of NSGAII with TOC and cost as axes
5.2 Formalization of the Problem and Analysis of the Results As formalized in Sect. 3.2, we define our multi-objective optimization problem in the following way: min F(x) = min [−T OC(x), −star s(x), cost (x), age(x), distance(x)],
(5)
where x ∈ S, the set of the machines in the filtered database. The results obtained with the deterministic methods (Full non-dominated sorting and ranking being the slowest ones) were used as optimal results to calculate the quality metrics (Sect. 3.3) for the stochastic algorithms. As expected, NSGAII and NSGAIII have very close results, and they are much better than the ones of SPEA2. The results of the optimization are shown to the client company in a scatter plot, such as the one in Fig. 3. The axes to plot can be chosen by the client company (in this case, the TOC and the cost).
6 Conclusions This paper proposes a multi-criteria optimization algorithm as part of the machine broker software, which will propose available machines for taking over the production of user companies in the network who are unable to continue or finalize a production order due to unforeseen difficulties. The major contribution of the paper
46
G. H. A. Medeiros et al.
is an extensive state-of-the-art1 on existing methods on multi-criteria optimization algorithms for the new paradigm of “manufacturing as a service.” The results show that genetic algorithms or similar approaches perform better than other algorithms. Depending on the size of the machine database, it is preferable to use algorithms such as NSGAII or NSGAIII to obtain optimal or almost optimal results, as well as alternatives to the main Pareto front (if the Pareto front has few points, it is possible to consider other points from a secondary, tertiary, or even more fronts). It is possible to do the same with full non-dominated sorting and ranking (which additionally does it for the entire database), but the runtime is much longer as the number of machines grows, whereas for stochastic methods, the time is almost constant even if the database grows. Simple cull is faster than full non-dominated sorting and ranking, but there are no alternative fronts to be explored. For future work, there are several possible improvements, such as defining a machine similarity measure, based on their footprints, to obtain other machines that can perform similar functions to the one that could be out of order. New criteria as attributes in the machine database can also be considered, without radically changing the implementation done. Acknowledgements This work has received funding from INTERREG Upper Rhine (European Regional Development Fund) and the Ministries for Research of Baden-Wurttemberg, RheinlandPfalz (Germany) and from the Grand Est French Region in the framework of the Science Offensive Upper Rhine HALFBACK project.
References 1. Barbucha, D., Czarnowski, I., Jedrzejowicz, P., Ratajczak-Ropel, E., Wierzbowska, I.: JABAT middleware as a tool for solving optimization problems. In: Transactions on Computational Collective Intelligence II, pp. 181–195, Springer, Berlin, Heidelberg (2010) 2. Bassi, M., Cursi, E.S.D., Pagnacco, E., Ellaia, R.: Statistics of the Pareto front in multi-objective Optimization under Uncertainties. Latin Am. J. Solids and Struct. 15(11) (2018) 3. Catania, C., Zanni-Merk, C., de Beuvron, F.: Ontologies to lead knowledge intensive evolutionary algorithms. Int. J. Knowl. Syst. Sci., IGI Global 7(1), 78–100 (2016) 4. Czyz˙zak, P., Jaszkiewicz, A.: Pareto simulated annealing–a metaheuristic technique for multiple-objective combinatorial optimization. J. Multi-Criteria Dec. Anal. 7(1), 34–47 (1998) 5. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. Trans. Evol. Comp 6(2), 182–197 (2002) 6. Geilen, M., Basten, T.: A calculator for Pareto points. In the Proceedings of the Conference on Design, Automation and Test in Europe, pp. 1–6 (2007) 7. Giraldez, R., Aguilar-Ruiz, J.S., Riquelme, J.C.: Knowledge-based fast evaluation for evolutionary learning. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 35(2), 254–261 (2005) 8. Halfback Website: Halfback Project Description. [ONLINE] (2017) http://halfback.in.hsfurtwangen.de/home/description/. Accessed 27 Dec 2019 9. Jafari, S., Bozorg-Haddad, O., Chu, X.: Cuckoo optimization algorithm (COA). In Advanced Optimization by Nature-Inspired Algorithms, pp. 39–49. Springer, Singapore (2018) 1 State-of-the-art source code and evaluations can be found at https://github.com/Gabriel382/multi-
objective-optimization-algorithms.
Manufacturing as a Service in Industry 4.0: A Multi-Objective …
47
10. James F.A.: Maintaining knowledge about temporal intervals. In Readings in Qualitative Reasoning About Physical Systems, pp. 361–372. Elsevier (1990) 11. Lermer, M., Frey, S., Reich, C.: Machine learning in cloud environments considering external information. IMMM 2016: Valencia, Spain (2016) 12. Nobahari, H., Nikusokhan, M., Siarry, P.: A MOGSA based on non-dominated sorting. Int. J. Swarm Intell. Res. 3, 32–49 13. NSGA-III. https://deap.readthedocs.io/en/master/examples/nsga3.html. Accessed 04 Jan 2020 14. Pirim, H., Eksioglu, B., Bayraktar, E.: Tabu Search: a comparative study, pp. 1–27, INTECH Open Access Publisher (2008) 15. Rothlauf, F.: Binary representations of integers and the performance of selectorecombinative genetic algorithms. In: International Conference on Parallel Problem Solving from Nature, pp. 99–108. Springer, Berlin, Heidelberg (2002) 16. Rostami, S.: Synthetic Objective Functions and ZDT1. [ONLINE] https://shahinrostami. com/posts/search-and-optimisation/practical-evolutionary-algorithms/synthetic-objectivefunctions-and-zdt1/. Accessed 30 Dec 2019 17. Yusoff, Y., Ngadiman, M.S., Zain, A.M.: Overview of NSGA-II for optimizing machining process parameters. Procedia Eng. 15, 3978–3983 (2011) 18. Zames, G., Ajlouni, N.M., Ajlouni, N.M., Ajlouni, N.M., Holland, J.H., Hills, W.D., Goldberg, D.E.: Genetic algorithms in search, optimization and machine learning. Inf. Technol. J. 3(1), 301–302 (1981) 19. Zitzler, E., Deb, K., Thiele, L.: Comparison of multiobjective evolutionary algorithms: empirical results. Evol. Comput. 8(2), 173–195 (2000) 20. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the strength Pareto evolutionary algorithm. TIK-report 103 (2001) 21. Zhou, A., Qu, B.Y., Li, H., Zhao, S.Z., Suganthan, P.N., Zhang, Q.: Multiobjective evolutionary algorithms: a survey of the state of the art. Swarm Evolut. Comput. 1(1), 32–49 (2011)
Conservative Determinization of Translated Automata by Embedded Subset Construction Michele Dusi and Gianfranco Lamperti
Abstract A translated finite automaton (TFA) results from a translation of a deterministic finite automaton (DFA). A translation is based on a mapping from the alphabet of the DFA to a new alphabet, where each symbol in the original alphabet is substituted with a symbol in the new alphabet. When this substitution generates a nondeterministic automaton, the TFA may need to be determinized into an equivalent DFA. Determinization of TFAs may be useful in a variety of domains, specifically in model-based diagnosis of discrete-event systems, where large TFAs constructed by model-based reasoning are processed to perform knowledge compilation. Since, in computation time, the classical Subset Construction determinization algorithm may be less than optimal when applied to large TFAs, a conservative algorithm is proposed, called Embedded Subset Construction. This alternative algorithm updates the TFA based on the mapping of the alphabet rather than building a new DFA from scratch. This way, in contrast with Subset Construction, which performs an exhaustive processing of the TFA to be determinized, the portion of the TFA that does not require determinization is not processed. Embedded Subset Construction is sound and complete, thereby yielding a DFA that is identical to the DFA generated by Subset Construction. The benefit of using Embedded Subset Construction largely depends on the portion of the TFA that actually requires determinization. Experimental results indicate the viability of Embedded Subset Construction, especially so when large TFAs are affected by small portions of nondeterminism.
M. Dusi · G. Lamperti (B) Department of Information Engineering, University of Brescia, Brescia, Italy e-mail: [email protected] M. Dusi e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_5
49
50
M. Dusi and G. Lamperti
1 Introduction A finite automaton (FA) is a mathematical computational model that can describe a large variety of tools, such as the recognizer of a regular language [1], a discrete-event system (DES) [6], an active system [11], or a diagnoser [14, 15]. Usually, for performance reasons, a nondeterministic finite automaton (NFA) is transformed into an equivalent deterministic finite automaton (DFA) before being exploited. Equivalence means that the NFA and the DFA share the same regular language. Traditionally, this determinization process is performed by the Subset Construction algorithm [7, 13]. In some circumstances, the NFA to be determinized results from a translation of a DFA. Specifically, given a mapping Σ → Σ , where Σ is the alphabet of a DFA and Σ is a new alphabet (possibly extended with the empty symbol ε), each symbol in Σ is substituted with a corresponding symbol in Σ , thereby changing the regular language of the resulting FA, called a translated finite automaton (TFA). Since the resulting TFA may be nondeterministic, determinization may be required. To this end, Subset Construction may be used. However, if the TFA is large and nondeterminism is confined to a tiny portion of the automaton, Subset Construction may be less than optimal, because it constructs an equivalent DFA by the exhaustive processing of the TFA, without reusing the major portion of the TFA that keeps being deterministic after the translation. This is why a conservative determinization algorithm is proposed in this paper, namely, Embedded Subset Construction. Instead of generating an equivalent DFA from scratch, as Subset Construction does, this alternative algorithm focuses solely on the states of the TFA whose transition function has become nondeterministic owing to the translation, thereby preserving the (possibly large) portion of the TFA that does not require determinization. To tackle the design of Embedded Subset Construction, we took inspiration from the determinization of mutating finite automata (MFAs) [8], including previous research [4, 5, 9, 12]. Albeit at a conceptual level TFAs differ from MFAs, we discovered that the determinization algorithm proposed for MFAs, namely, Subset Restructuring, could be adapted for the determinization of TFAs. Hence, even if Embedded Subset Construction differs from Subset Restructuring considerably, the two algorithms share a common profile. The problem of determinization of (large) TFAs stems from extensive research in model-based diagnosis of active systems [11], where large TFAs need to be determinized for knowledge-compilation purposes [2, 3, 10]. This paper presents Embedded Subset Construction, along with relevant experimental results.
2 Determinization of Finite Automata by Subset Construction An FA can be either deterministic (DFA) or nondeterministic (NFA). Formally, a DFA is a 5-tuple (Σ, D, Td , d0 , Fd ), where D is the set of states, Σ is a finite set of symbols called the alphabet, Td is the transition function, Td : DΣ → D, where
Conservative Determinization of Translated Automata …
51
DΣ ⊆ D × Σ, d0 is the initial state, and Fd ⊆ D is the set of final states. Determinism comes from Td mapping a state–symbol pair into one state. An NFA is a 5-tuple (Σ, N , Tn , n 0 , Fn ), where the fields have the same meaning as in the DFA except that the transition function is nondeterministic, Tn : NΣε → 2 N , where NΣε ⊆ N × (Σ ∪ {ε}), where ε is the empty symbol. Each FA is associated with a regular language, which is the set of strings on Σ generated by a path from the initial state to a final state. Two FAs are equivalent when they are associated with the same regular language. Within an FA, a transition mapping a pair (s, ) is said to be marked by the symbol and is called an -transition. For each NFA, there exists at least one equivalent DFA. One such DFA is generated by the Subset Construction algorithm, which, for this reason, is said to be SC-equivalent to the NFA. Determinization is notoriously convenient because processing a DFA is generally more efficient than processing an NFA [1]. A pseudocode of Subset Construction is listed in Algorithm 1, where N is the input NFA and D is the output (SC-equivalent) DFA. By construction, each state d in the DFA is identified by a subset of the states of the NFA, denoted d, called the extension of d. The initial state d0 of the DFA is the ε-closure of the initial state n 0 of the NFA. The generation of the DFA is supported by a stack S. Each element in S is a state of the DFA to be processed. Processing a state d in the stack means generating the transition function of d, that is, the set of transitions d, , d exiting d and entering another state d . Initially, S contains the initial state d0 only. Then, Subset Construction pops and processes one state d from S at a time, by generating the transitions exiting d, until S becomes empty (lines 9–21), where -mapping(d, N ) in line 12 is the ε-closure of the set of states in N that are entered by a transition exiting a state in d and marked with . Since the powerset of N (namely, the set of possible states in D) is finite, the algorithm is bound to terminate, with D being the DFA that is SC-equivalent to N . Eventually, the final states of D are those whose extension includes (at least) one final state of N (line 22). Example 1 Shown on the left side of Fig. 1 is a DFA D, which includes four states (represented by circles), where 0 is the initial state and 3 is the (only) final state, and six transitions (represented by arcs). Next to it (center of Fig. 1) is an NFA N , where the state 3 is exited by two transitions marked by the symbol c, namely, 3, c, 1 and 3, c, 2 , and state 2 is exited by the ε-transition 2, ε, 3 . The DFA that is SCequivalent to N (generated by Subset Construction) is shown on the right side, where each state d0 , . . . , d4 is identified by a subset of the states in N . Final states are d2 , d3 , and d4 .
3 Translated Finite Automata A TFA N is an NFA obtained from a DFA D based on a mapping of the alphabet Σ of D. Specifically, each symbol marking a transition in D is replaced with a symbol in Σ ∪ {ε}, where Σ is a new alphabet and a mapping Σ → Σ ∪ {ε} is
52
M. Dusi and G. Lamperti
Algorithm 1 Subset Construction generates the DFA D SC-equivalent to the NFA N. 1: procedure Subset Construction(N , D) 2: input 3: N = (Σ, N , Tn , n 0 , Fn ): an NFA 4: output 5: D = (Σ, D, Td , d0 , Fd ): the DFA that is SC-equivalent to N 6: begin 7: Generate the initial state d0 of D, where d0 = ε-closure(n 0 , N ) 8: S ← [d0 ] 9: repeat 10: Pop a state d from S 11: for all ∈ Σ such that n, , n ∈ Tn , n ∈ d do 12: N ← -mapping(d, N ) 13: if there is no state d in D such that d = N then 14: Insert a new state d into D, where d = N 15: Push d on S 16: else 17: Let d be the state in D such that d = N 18: end if 19: Insert a new transition d, , d into D 20: end for 21: until S is empty 22: Fd ← {d | d ∈ D, d ∩ Fn = ∅} 23: end procedure
Fig. 1 DFA D (left), NFA N (center), and the relevant SC-equivalent DFA (right)
assumed. Consequently, the language of N is in general different from the language of D, because each string in D is mapped to a (generally) different string in N . Definition 1 Let Σ be the alphabet of a DFA D, Σ a new alphabet, and μ = Σ → Σ ∪ {ε} a mapping. The NFA N obtained from D by substituting each symbol in Σ with the corresponding symbol mapped in Σ ∪ {ε} is the translated finite automaton of D based on μ, denoted N = μ(D). Example 2 Consider the DFA D on the left side of Fig. 1, where Σ = {a, b, c, d, e}. Let Σ = {a, b, c} and μ = Σ → Σ ∪ {ε} a mapping defined as follows: a → a, b → b, c → c, d → c, and e → ε. The resulting TFA, namely, N = μ(D), is shown in the center of Fig. 1, where nondeterminism is twofold: state 3 is exited by two
Conservative Determinization of Translated Automata …
53
Algorithm 2 Embedded Subset Construction generates the DFA D that is SCequivalent to the TFA μ(D). 1: procedure Embedded Subset Construction(D, μ, D ) 2: input 3: D: a DFA with alphabet Σ 4: μ = Σ → Σ ∪ {ε}: a mapping, where Σ is a new alphabet 5: output 6: D : the DFA that is SC-equivalent to the TFA μ(D) 7: begin 8: Automaton Translation(D, μ, N , D , B) 9: Bud Processing(N , B, D ) 10: end procedure
transitions marked by the same symbol c, and state 2 is exited by an ε-transition. The DFA that is SC-equivalent to N is displayed on the right side of Fig. 1. In a large TFA, nondeterminism may be confined to a small portion of the automaton. If so, determinizing a TFA by means of Subset Construction is bound to keep a considerable portion of the TFA as is, which nonetheless may in general require substantial processing: the larger the unchanged portion of the TFA after the determinization, the larger the waste of processing performed by Subset Construction. Thus, generating from scratch the DFA that is SC-equivalent to a TFA may be less than optimal in terms of computational time, as no reuse is pursued. This is why we designed a conservative determinization algorithm called Embedded Subset Construction, as detailed in Sect. 4.
4 Embedded Subset Construction Embedded Subset Construction takes a DFA D and a mapping μ as input (cf Algorithm 1), and generates the DFA D that is SC-equivalent to the NFA μ(D). First, it performs the DFA translation by creating the TFA N = μ(D) by calling the auxiliary procedure Automaton Translation (line 8), which generates D and B. Specifically, D is a copy of N where each state n is replaced with a singleton {n}. It represents the initial instance of the FA that will be transformed into the actual DFA SC-equivalent to μ(D). B is a list of buds, where a bud is a pair (d, ), with d being a state in D and a symbol in Σ ∪ Σ ∪ {ε}. A bud is indicative that the transition function mapping the pair (d, ) is to be updated in D . The buds in B are partially ordered based on the distance of d, denoted δ(d), which is the minimal length of the paths connecting the initial state of D with d. This way, D is manipulated top-down, starting from the states with shortest distance (to guarantee the termination of the algorithm). Parameters N , B, and D are then passed in line 9 to the actual core of the algorithm, namely, Bud Processing.
54
M. Dusi and G. Lamperti
4.1 Automaton Translation The pseudocode of Automaton Translation is listed in Algorithm 2. In constructing the FA D isomorphic to N , it detects the states arising nondeterminism when the substitution of the symbol ∈ Σ with the corresponding symbol is performed. Two cases are possible: either = ε (lines 15–22) or = ε (lines 23–26). Notably, when = ε, the original symbol is kept in the corresponding transition of D . This may sound counterintuitive, as we would expect ε to replace . The reason for this is that, for technical reasons, D cannot include any ε-transition. Still, the creation of the bud (dn d , ) (line 16) is indicative that the transition function of the state dn d needs to be updated in respect of the symbol , which is based on the transition n d , , n d inserted into N in line 17. Moreover, if dn d is the initial state d0 of D , then a bud (d0 , ε) is generated, as the extension of d0 may change (lines 17 and 18); otherwise, buds are created for the parent states of dn d (line 20). If, instead, = ε (lines 22– 27), then a transition dn d , , dn d is created (line 23). Should this transition raise nondeterminism, a bud (dn d , ) is inserted into B (lines 24 and 25). Eventually, the bud list B accounts for all the states in D that needs to be revised in their transition function. As such, B places somewhat the role of the stack S in Subset Construction (cf Algorithm 1 in Sect. 2).
4.2 Bud Processing The pseudocode of Bud Processing is listed in Algorithm 2. It takes as input a TFA N , a bud list B, and the initial configuration of D , which is eventually transformed into the DFA SC-equivalent to N . Each bud in B is processed based on eight rules, namely, R0 , . . . , R7 . Rule R0 (lines 5–7) processes the bud (d0 , ε) by calling the procedure Extension Update, which updates the extension of d0 , possibly generating new buds for d0 . Generally speaking, if the new extension of the state equals the extension of another state, then the two states are merged into a single one, with possible pruning of B. Rules R1 , . . . , R7 are processed within the loop in lines 8– 38. Rule R1 (lines 11 and 12) comes into play when a bud (d, ) is such that no transition marked by exits any state in d. If so, a pruning of D is performed by the auxiliary procedure Automaton Pruning, which removes the portion of D that is no longer reachable once the transition exiting d and marked by has been removed. Rule R2 (lines 14 and 15) creates a new transition d, , d , where d exists already, possibly updating the distances of relevant states. Rule R3 (lines 17 and 18) creates both a new state d and a new transition d, , d , along with relevant new buds. Rules R4 , . . . , R7 are applied when d is exited by a transition marked with . Since several transitions marked by may exist owing to possible previous merging of states, these rules are applied for each such transition, namely, t = d, , d . Rule R4 (lines 22 and 23) is applied when no other transition enters d and d = d0 , in which case the extension of d is updated. Rule R5 (lines 25 and 26) is applied when
Conservative Determinization of Translated Automata …
55
Algorithm 3 Automaton Translation generates the TFA N = μ(D), as well as an FA D isomorphic to N (where each state is a singleton containing the corresponding NFA state) and the initial bud list B. 1: procedure Automaton Translation(D, μ, N , D , B) 2: input 3: D = (Σ, D, Td , d0 , Fd ): a DFA 4: μ = Σ → Σ ∪ {ε}: a mapping, where Σ is a new alphabet 5: output 6: N = (Σ , N , Tn , n 0 , Fn ): the TFA μ(D) 7: D = (Σ , D , Td , d0 , Fd ): a finite automaton isomorphic to N 8: B: the initial bud list 9: begin 10: B ← [ ] 11: For each state d in D, create a state n d in N and a state dn d in D where dn d = {n d } 12: for all state d in D taken in ascending order based on its distance δ(d) do 13: for all transition d, , d in D, with being the symbol mapping in μ do 14: Insert a transition n d , , n d into N 15: if = ε then 16: Insert a transition dn d , , dn d into D and a new bud (dn d , ) into B 17: if dn d is the initial state d0 of D then 18: Insert the bud (d0 , ε) into B 19: else ¯ dn d in D where ¯ = ε, insert the bud d, ¯ ¯ into B ¯ , 20: For each transition d, 21: end if 22: else 23: Insert a transition dn d , , dn d into D ¯ where d¯ = dn then 24: if D includes a transition dn d , , d d 25: Insert the bud dn d , into B 26: end if 27: end if 28: end for 29: end for 30: end procedure
the redirection of t toward an existing state d does not cause the disconnection of d , a condition guaranteed in line 30. Rule R6 (lines 28 and 29) is grounded on the same condition of R5 , except that the state d is created before the redirection of t. Finally, rule R7 (lines 32–34) comes into play when the redirection of t may provoke a disconnection of a portion of D . To prevent a possible disconnection, each transition other than t that enters d and would become inconsistent with the new extension of d is removed and surrogated with relevant buds. Eventually, the extension of d is updated. The computational state of Bud Processing can be represented by a configuration ¯ B), ¯ where D¯ is the current configuration of D and B¯ is the current conα = (D, figuration of B. The processing of a bud moves from α to a new configuration α . Hence, the algorithm performs a trajectory from the initial configuration α0 to a final configuration αf = (Df , Bf ), where Df is the DFA that is SC-equivalent to N and Bf is empty.
56
M. Dusi and G. Lamperti
Algorithm 4 Bud Processing processes the buds in B in order to transform D into the DFA that is SC-equivalent to the TFA N . 1: procedure Bud Processing(N , B, D ) 2: input N : a TFA, B: the initial bud list for D 3: D : a FA isomorphic to N , where the extension of each state is a singleton 4: begin 5: if (d0 , ε) is the first bud in B where d0 is the initial state of D then # Rule R0 6: Extension Update(d0 , ε-closure(d0 , N )) 7: end if 8: while B is not empty do 9: Remove from B the bud (d, ) in first position 10: N ← -mapping(d, N ) 11: if N = ∅ then # Rule R1 12: Automaton Pruning(d, ) 13: else if no -transition exits d then 14: if d ∈ D , d = N then # Rule R2 15: Insert the transition d, , d into D and perform distance relocation 16: else # Rule R3 17: Create a new state d in D , where d = N and relevant buds 18: Insert the transition d, , d into D 19: end if 20: else 21: for all transition t = d, , d in D such that d = N do 22: if d = d0 and no other transition enters d then # Rule R4 , N) 23: Extension Update(d 24: else if d = d0 or t ∈ D and t = t and t = d p , x, d and δ(d p ) ≤ δ(d) then 25: if d ∈ D , d = N then # Rule R5 26: Redirect t toward d and perform distance relocation 27: else # Rule R6 28: Create a new state d in D and relevant buds, where d = N 29: Redirect t toward d 30: end if 31: else # Rule R7 32: Remove transitions in T = t | t = t, t = d p , x, d , x-closure(d p , N ) = N 33: Create new buds for the transitions removed 34: Extension Update(d , N) 35: end if 36: end for 37: end if 38: end while 39: end procedure
Example 3 With reference to Example 2, consider the DFA D and the TFA N = μ(D) displayed in Fig. 1. Shown in Fig. 2 is the trajectory of Bud Processing, where each configuration αi = (Di , Bi ), i ∈ [0 .. 5], is represented by a decorated FA, where the FA equals Di , whereas the decoration (red symbols marking states) represents Bi . Specifically, if a bud (d, ) ∈ Bi , then is a symbol marking d. The initial value of B, namely, B0 = [(d0 , b), (d2 , e), (d3 , c)], is the result of Automaton Translation (cf Algorithm 3). The first bud (d0 , b) is processed by rule R7 , which, after removing
Conservative Determinization of Translated Automata …
57
Fig. 2 Trajectory of Bud Processing for the TFA displayed in the center of Fig. 1
the transition d3 , c, d2 (the bud (d3 , c) exists already), updates d2 to {2, 3}. These actions result in the configuration α1 . The processing of the bud (d2 , c) is performed by rule R3 , which creates the new state d4 and the new transition d2 , c, d4 , along with the new buds (d4 , b) and (d4 , c). This results in the configuration α2 . The processing of (d2 , e) is performed by rule R1 , as N = ∅, resulting in the pruning of d2 , e, d3 (configuration α3 ). The processing of (d3 , c) is carried out by rule R2 , which inserts the transition d3 , c, d4 (configuration α4 ). Rule R2 also processes the bud (d4 , b) by inserting d4 , b, d3 (configuration α5 ). The processing of the last bud (d4 , c) is still performed by R2 , which insert the auto-transition d4 , c, d4 , leading to the final DFA displayed on the right side of Fig. 1, which is in fact the DFA SC-equivalent to μ(D).
5 Experimental Results Both Subset Construction and Embedded Subset Construction have been implemented in C++17 under Linux (Ubuntu 18.04), on a machine with Intel(R) Xeon(R) Gold 6140M CPU (2.30GHz) and 128 GB of working memory. To compare the two algorithms, a procedure for automatic generation of (large) FAs has been developed. In order to have control on the amount of processing necessary for determinization, FAs are stratified. In a stratified FA, the set of states are partitioned into strata, where each stratum includes the states with the same distance. Moreover, transitions exiting a stratum at distance δ can enter either states of the same stratum or states of the successive stratum (at distance δ + 1). FAs are generated based on a set of parameters, including the number of states, the amount of transitions exiting a state, namely, the
58
M. Dusi and G. Lamperti
branching, the percentage of ε-transitions, the total number of strata, and the determinism, namely, the portion of the FA (measured by a distance) that is deterministic already and, as such, does not need determinization. This last parameter allows for the control of the impact of the determinization, namely, the number of states that are actually processed by Embedded Subset Construction. Experiments focused on CPU processing time, where each time value of the plot represents the average of 10 different test cases. In addition to the curves representing the processing time of Subset Construction and Embedded Subset Construction, each graph includes a third curve relevant to the gain, namely, a number within the range [−1 .. 1] indicating the percentage of time saved (if the gain is positive) or wasted (if the gain is negative) by Embedded Subset Construction over Subset Construction. Denoting with tSC and tESC the processing time of Subset Construction and Embedded Subset Construction, respectively, the gain γ is defined as γ =
tSC − tESC . max tSC , tESC
(1)
Displayed in Fig. 3 are two different experiments where the varying parameter is the number of states (without ε-transitions). The FAs considered in the experiment on the left have 100 strata, whereas those on the right have 1000 strata. Each of the two graphs indicate the processing time (left axis) of Subset Construction (SC), the processing time of Embedded Subset Construction (ESC), and the gain (right axis). As we can see, in the left experiment, ESC outperforms SC (positive gain) up to a certain number of states, beyond which SC becomes more efficient than ESC. Instead, in the experiment on the right, ESC outperform SC systematically (note how the gain is almost constant). The difference in the time response of the two experiments is caused by the fact that the size of the strata differs and bigger strata are processed more efficiently by SC than by ESC because the impact is bound to increase with the size of strata. A second experiment is displayed in Fig. 4, where the varying parameter is the number of strata (up to 150). What distinguishes the two experiments is that in the
40,000
SC ESC γ
10
102 103
−1
103.3 103.6 Number of states
103.9
Time (ms)
0 3
1
30,000
104 Gain γ
Time (ms)
1
SC ESC γ
20,000
0
10,000
−1
103
103.3 103.6 Number of states
103.9
Fig. 3 Results with varying number of states, with 100 strata (left) and 1000 strata (right)
Gain γ
105
Conservative Determinization of Translated Automata … SC ESC γ
SC ESC γ
102.8
102.9
0
102.6 102.3
Time (ms)
1
103.2
Gain γ
Time (ms)
103.51
−1
1
102.6 102.4
0
102.2
Gain γ
103.81
59
−1
102
102 101
101.2
101.4 101.6 101.8 Number of strata
20
102
40
60 80 100 120 Number of strata
140
Fig. 4 Results with varying number of strata, with ε-transitions (left) and without (right) 120 100
102.51
0
101.9
−1 SC ESC γ
1.3
10
0
20
Time (ms)
1 Gain γ
Time (ms)
103.11
1
80 0
60 40
−1 SC ESC γ
20 40 60 Determinism
80
100
Gain γ
103.71
0
20
40 60 Determinism
80
100
Fig. 5 Results with varying determinism, with ε-transitions (left) and without (right)
left experiment there are 50% of ε-transitions, whereas in the right experiment no ε-transition is included. The presence of ε-transitions makes SC to outperform ESC (left graph). By contrast, when there is no ε-transition, ESC starts outperforming SC from a certain number of strata onward. As a result, the gain, which is initially negative, becomes subsequently increasingly positive. In the third experiment, shown in Fig. 5, the only varying parameter is the determinism, namely, the upper portion of the NFA that is deterministic and, as such, is not touched by ESC. In both experiments, the number of strata is 100, and the determinism is defined by a distance. For instance, if the value of determinism is 40, it means that the portion of the NFA up to the 40-th stratum is deterministic. Unlike the experiment on the left, where there are 50% of ε-transitions, the experiment on the right does not involve ε-transitions. In the left experiment, ESC outperforms SC only beyond a certain stratum, while in the experiment on the right, ESC outperforms SC in every instance. All the experiments have also confirmed the correctness of ESC empirically: the DFA obtained by ESC is always identical to the corresponding DFA generated by SC.
60
M. Dusi and G. Lamperti
6 Conclusion The determinization of NFAs is grounded on practical reasons, mostly relevant to performance, as processing a DFA is generally more efficient than processing an equivalent NFA. In Subset Construction, the DFA is constructed from scratch starting from the initial state and generating the transition function of each state created. As such, Subset Construction is context-free, in the sense that it does not consider any information other than the NFA under processing. Still, if the NFA results from a translation of a DFA, in other words, the NFA is a TFA, chances are that the nondeterminism is confined to a tiny portion of the NFA. Consequently, applying Subset Construction to the TFA may result in a large waste of processing because of the regeneration of the same deterministic portion. To cope with this inefficiency, a novel algorithm has been proposed in this paper, called Embedded Subset Construction, which, instead of generating the equivalent DFA from scratch, aims to preserve the part of the TFA that does not require determinization. However, as indicated by the experimental results, Embedded Subset Construction is not always a panacea in terms of computation time. All depends on the impact of the translation, namely, the number of states that need to be created, deleted, or otherwise processed. Roughly, the larger the impact, the smaller the convenience in using Embedded Subset Construction instead of Subset Construction. Whether to adopt Embedded Subset Construction or Subset Construction remains an empirical question, which can be answered based on the actual application domain. Albeit Embedded Subset Construction is proposed in the context of translated automata, it can be easily adapted to the determinization of any NFA, irrespective of the fact that the NFA is the result of a DFA translation. What is needed is the initial instance of the bud list, which is currently generated by the auxiliary procedure Automaton Translation (cf Algorithm 4). Future research will focus on this topic and relevant applications. Acknowledgements We would like to thank the anonymous reviewers for their constructive comments, which help improve the quality of the final paper. This work was supported in part by Regione Lombardia (Smart4CPPS, Linea Accordi per Ricerca, Sviluppo e Innovazione, POR-FESR 2014– 2020 Asse I) and by the National Natural Science Foundation of China (grant number 61972360).
References 1. Aho, A., Lam, M., Sethi, R., Ullman, J.: Compilers—Principles, Techniques, and Tools, 2nd edn. Addison-Wesley, Reading, MA (2006) 2. Bertoglio, N., Lamperti, G., Zanella, M.: Intelligent diagnosis of discrete-event systems with preprocessing of critical scenarios. In: Czarnowski, I., Howlett, R., Jain, L. (eds.) Intelligent Decision Technologies 2019, Smart Innovation, Systems and Technologies, vol. 142, pp. 109– 121. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-8311-3_10 3. Bertoglio, N., Lamperti, G., Zanella, M., Zhao, X.: Twin-engined diagnosis of discrete-event systems. Eng. Reports 1, 1–20 (2019). https://doi.org/10.1002/eng2.12060 4. Brognoli, S., Lamperti, G., Scandale, M.: Incremental determinization of expanding automata. Comput. J. 59(12), 1872–1899 (2016). https://doi.org/10.1093/comjnl/bxw044
Conservative Determinization of Translated Automata …
61
5. Caniato, G., Lamperti, G.: Online determinization of large mutating automata. In: Howlett, R., Toro, C., Hicks, Y., Jain, L. (eds.) Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 22nd International Conference, KES-2018, Belgrade, Serbia, Procedia Computer Science, vol. 126, pp. 59–68. Elsevier (2018). https://doi.org/10.1016/j. procs.2018.07.209 6. Cassandras, C., Lafortune, S.: Introduction to Discrete Event Systems, 2nd edn. Springer, New York (2008) 7. Hopcroft, J., Motwani, R., Ullman, J.: Introduction to Automata Theory, Languages, and Computation, 3rd edn. Addison-Wesley, Reading, MA (2006) 8. Lamperti, G.: Temporal determinization of mutating finite automata: Reconstructing or restructuring. Software: Practice and Experience, pp. 1–31 (2019). https://doi.org/10.1002/spe.2776 9. Lamperti, G., Zanella, M., Chiodi, G., Chiodi, L.: Incremental determinization of finite automata in model-based diagnosis of active systems. In: Lovrek, I., Howlett, R., Jain, L. (eds.) Knowledge-Based Intelligent Information and Engineering Systems. Lecture Notes in Artificial Intelligence, vol. 5177, pp. 362–374. Springer, Berlin, Heidelberg (2008) 10. Lamperti, G., Zanella, M., Zhao, X.: Abductive diagnosis of complex active systems with compiled knowledge. In: Thielscher, M., Toni, F., Wolter, F. (eds.) Principles of Knowledge Representation and Reasoning: Proceedings of the Sixteenth International Conference (KR2018), pp. 464–473. AAAI Press, Tempe, Arizona (2018) 11. Lamperti, G., Zanella, M., Zhao, X.: Introduction to Diagnosis of Active Systems. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92733-6 12. Lamperti, G., Zhao, X.: Decremental subset construction. In: Czarnowski, I., Howlett, R., Jain, L. (eds.) Intelligent Decision Technologies 2017, Smart Innovation, Systems and Technologies, vol. 72, pp. 22–36. Springer International Publishing (2018). https://doi.org/10.1007/978-3319-59421-7_3 13. Rabin, M., Scott, D.: Finite automata and their decision problems. IBM J. Res. Dev. 3(2), 114–125 (1959). https://doi.org/10.1147/rd.32.0114 14. Sampath, M., Sengupta, R., Lafortune, S., Sinnamohideen, K., Teneketzis, D.: Diagnosability of discrete-event systems. IEEE Trans. Autom. Control 40(9), 1555–1575 (1995) 15. Sampath, M., Sengupta, R., Lafortune, S., Sinnamohideen, K., Teneketzis, D.: Failure diagnosis using discrete-event models. IEEE Trans. Control Syst. Technol. 4(2), 105–124 (1996)
Explanatory Monitoring of Discrete-Event Systems Nicola Bertoglio, Gianfranco Lamperti, Marina Zanella, and Xiangfu Zhao
Abstract Model-based diagnosis was first proposed for static systems, where the values of the input and output variables are given at a single time point and the root cause of an observed misbehavior is a set of faults. This set-oriented perspective of the diagnosis results was later adopted also for dynamical systems, although it fits neither the temporal nature of their observations, which are gathered over a time interval, nor the temporal evolution of their behavior. This conceptual mismatch is bound to make diagnosis of discrete-event systems (DESs) poor in explainability. Embedding the reciprocal temporal ordering of faults in diagnosis results may be essential for critical decision-making. To favor explainability, the notions of temporal fault, explanation, and explainer are introduced in diagnosis during monitoring of DESs. To achieve explanatory monitoring, a technique is described, which progressively refines the diagnosis results produced already.
1 Introduction It is quite common to think that diagnosis is a task aimed at finding the faults that affect a (natural or artificial) system given the collected symptoms, these being some observations (e.g., sensor measurements) that denote an abnormal behavior. More precisely, diagnosis is a task that aims at explaining the given symptoms [17], that is, at finding what has happened inside the system that can have caused them. In N. Bertoglio · G. Lamperti (B) · M. Zanella Department of Information Engineering, University of Brescia, Brescia, Italy e-mail: [email protected] N. Bertoglio e-mail: [email protected] M. Zanella e-mail: [email protected] X. Zhao School of Computer and Control Engineering, Yantai University, Yantai, China e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_6
63
64
N. Bertoglio et al.
AI, from the mid-’80s the diagnosis task has been faced through the model-based paradigm [11, 19], which exploits a model of the considered system. For diagnosing a dynamical system, a discrete-event system (DES) model can be adopted [9], this being either a Petri net [2, 10] or a net of communicating finite automata, an automaton for each component, like in the current paper. Although the task can be performed by processing automata that represent just the nominal behavior [18], usually each state transition is either normal or abnormal, as in the seminal work by [20, 21]. The input of the diagnosis task for a DES is a sequence of temporally ordered observations, called hereafter a temporal observation. The output is a set of candidates, with each candidate being a set of faults, where a fault is associated with an abnormal state transition. Diagnosing a DES becomes a form of abductive reasoning, inasmuch the candidates are generated based on the trajectories (sequences of state transitions) of the DES that entail the temporal observation. The approach in [20] performs the abduction offline, thus compiling the DES models into a diagnoser, a data structure that is consulted online in order to produce a new set of candidates upon perceiving each observation (diagnosis during monitoring). The rationale behind the activesystem approach [1, 13, 15], instead, is to perform the abduction online, a possibly costly operation that, however, being driven by the temporal observation, can only focus on the trajectories that imply this sequence. The diagnosis output is the set of candidates relevant to the (possibly infinite) set of trajectories of the DES that produce the temporal observation. Since the domain of faults is finite, both the candidates and the diagnosis output are finite and bounded. Still, in both the diagnoser approach and the active-system approach, a candidate is a set of faults. Consequently, the diagnosis output is devoid of any temporal information, while in the real world faults occur in a specific temporal order. One may argue that, since a new set of candidates is output upon the reception of a new observation, it is possible to ascertain whether some additional faults have occurred with respect to the previous observation. However, one cannot ascertain whether a fault occurred previously has occurred again. In a perspective of explainable diagnosis and, more generally, of explainable AI, in this paper, a candidate is not a set of faults; instead, it is a temporal fault, namely, the (possibly unbounded) sequence of faults relevant to a trajectory that produces the temporal observation. Consequently, the diagnosis output is the (possibly infinite) set of temporal faults relevant to the (possibly infinite) set of trajectories of the DES that imply the temporal observation (received so far). Since a set of temporal faults is a regular language, it can be represented by a regular expression defined over the alphabet of faults. The temporal aspect characterizing the temporal faults can help the diagnostician in making critical decisions, e.g., about repair actions. This novel characterization of the DES diagnosis output is the first contribution of the paper. The second one is a theoretical framework for defining the task that generates such an output. A nondeterministic finite automaton (NFA) compiled offline, called explainer, is proposed, which allows for the online computation of the new diagnosis output. The explainer acts somewhat as a counterpoint to Sampath’s diagnoser in set-oriented diagnosis of DESs. Building the explainer requires calling an algorithm that is able to compute the regular expressions representing the (fault) languages accepted by each state in a given NFA. This algorithm has been obtained by adapting
Explanatory Monitoring of Discrete-Event Systems
65
one from the literature [8]. The third contribution of the paper is an algorithm, called explanation engine, which, every time a new observation has been perceived, is able not only to generate an updated set of candidates but also to prune the previously output candidates that are no longer consistent with the temporal observation. This progressive refinement of results aims to enhance the explainability of diagnosis during monitoring, thus achieving an explanatory monitoring.
2 Des Modeling A DES X is a network of components, the behavior of each component being modeled as a communicating automaton [7]. A component is endowed with I/O pins, where each output pin is connected with an input pin of another component by a link. The way a transition is triggered in a component is threefold: (a) spontaneously, formally denoted by the empty event ε, (b) by an external event coming from the extern of X , or (c) by an internal event coming from another component. When an event occurs, a component may react by performing a transition. In general, when performing a transition, a component consumes the triggering (input) event and possibly generates new events on its output pins, which are bound to trigger the transitions of other components. A transition generating an event on an output pin can occur only if this pin is not occupied by another event. Assuming that only one component transition at a time can occur, the process that moves a DES from the initial state can be represented by a sequence of component transitions, called a trajectory of X . A contiguous subsequence of a trajectory is a trajectory segment. At the occurrence of a component transition, X changes its state, with a state x of X being a pair (C, L), where C is the array of the current component states and L the array of the (possibly empty) current events placed in links. Formally, the (possibly infinite) set of trajectories of X is specified by a deterministic finite automaton (DFA), namely, the space X ∗ of X , X ∗ = (Σ, X, τ, x0 ) ,
(1)
where Σ (the alphabet) is the set of component transitions, X is the set of states, τ is the deterministic transition function mapping a state and a component transition into a new state, τ : X × Σ → X , and x0 is the initial state. For diagnosis purposes, the model of X needs to be enriched by a mapping table. Let T be the set of component transitions in X , O a finite set of observations, and F a finite set of faults. The mapping table μ of X is a function μ(X ) : T → (O ∪ {ε}) × (F ∪ {ε}), where ε is the empty symbol. The table μ(X ) can be represented as a finite set of triples (t, o, f ), where t ∈ T, o ∈ O ∪ {ε}, and f ∈ F ∪ {ε}. The triple (t, o, f ) defines the observability and normality of t: if o = ε, then t is observable, else t is unobservable; likewise, if f = ε, then t is faulty, else t is normal. Based on μ(X ), each trajectory T in X ∗ can be associated with a temporal observation. The temporal observation of T is the sequence of observations involved in T ,
66
N. Bertoglio et al.
Obs(T ) = [ o | t ∈ T, (t, o, f ) ∈ μ(X ), o = ε ] .
(2)
In the literature, a trajectory T is also associated with a diagnosis, namely, the set of faults involved in T . As such, a diagnosis does not indicate the temporal relationships between faults, nor does it account for multiple occurrences of the same fault. On the other hand, treating a diagnosis as a set of faults guarantees that the domain of possible diagnoses is finite, being bounded by the powerset of the domain of faults. In contrast with this classical approach, we introduce the notion of a temporal fault. Definition 1 (Temporal fault) Let T be a trajectory of a DES. The temporal fault of T is the sequence of faults in T , Flt(T ) = [ f | t ∈ T, (t, o, f ) ∈ μ(X ), f = ε ].
(3)
Although T is finite, its length is in general unbounded; hence, the length of both Obs(T ) and Flt(T ) is in general unbounded. As for a trajectory, a contiguous subsequence of a temporal fault is called a temporal fault segment. Example 1 Shaded in the middle of the left side of Fig. 1 is a DES called P (protection), which includes two components, a sensor s and a breaker b, and one link connecting the (single) output pin of s with the (single) input pin of b. The model of s (outlined over P) involves two states (denoted by circles) and four transitions (denoted by arcs). The model of b is outlined under P. Each component transition t from a state p to a state p , triggered by an input event e, and generating a set of output events E, is denoted by the angled triple t = p, (e, E), p , as detailed in the table displayed on the right side of Fig. 1. The space of P, namely, P ∗ , is depicted on the left of Fig. 2, where each state is identified by a triple (ss , sb , e), with ss being the state of the sensor, sb the state of the breaker, and e the internal event in the link (ε means no event). To ease referencing, the states of P are renamed 0 · · · 7, where 0 is the initial state. Owing to cycles, the set of possible trajectories of P is infinite, one of them being T = [s3 , b5 , s1 , b3 , s4 , b3 , s2 , b5 ], with the following evolution:
Component transition
Description
s1 s2 s3 s4
= 0, (ko, {op}), 1 = 1, (ok , {cl }), 0 = 0, (ko, {cl }, 0 = 1, (ok , {op}), 1
The sensor detects a threatening event ko and generates the open event The sensor detects a liberating event ok and generates the close event The sensor detects a threatening event ko, yet generates the close event The sensor detects a liberating event ok , yet generates the open event
b1 b2 b3 b4 b5 b6 b7 b8
= 0, (op, ∅), 1 = 1, (cl , ∅), 0 = 0, (op, ∅), 0 = 1, (cl , ∅), 1 = 0, (cl , ∅), 0 = 1, (op, ∅), 1 = 0, (op, ∅), 1 = 1, (op, ∅), 0
The breaker reacts to the open event by opening The breaker reacts to the close event by closing The breaker does not react to the open event and remains closed The breaker does not react to the close event and remains open The breaker reacts to the close event by remaining closed The breaker reacts to the open event by remaining open The breaker reacts to the close event by opening The breaker reacts to the open event by closing
Fig. 1 DES P (left) and details of component transitions (right)
Explanatory Monitoring of Discrete-Event Systems t s1 s2 s3 s4 b1 b2 b3 b4 b5 b6 b7 b8
o f sen sen ε ε bre bre ε ε bre bre bre bre
ε ε f1 f2 ε ε f3 f4 ε ε f5 f6
67 o Observation description sen The sensor s performs an action bre The breaker b performs an action f Fault description f1 f2 f3 f4 f5 f6
s sends the cl command instead of op s sends the op command instead of cl b remains closed on the op command b remains open on the cl command b opens on the cl command b closes on the op command
Fig. 2 Space P ∗ (left), mapping table μ(P ) (center), and symbol description (right)
s detects a threatening event but commands b to close; b remains closed; s detects again a threatening event and commands b to open; b does not open and remains closed; s detects a liberating event, yet commands b to open; b does not open and remains closed; s detects a liberating event and commands b to close; b remains closed. The mapping table μ(P) is displayed on the center of Fig. 2, with observations and faults being described on the right of the figure. Only one observation is provided for both the sensor and the breaker, namely, sen and bre, respectively, each being associated with several transitions. Based on μ(P) and the trajectory T defined above, we have Obs(T ) = [bre, sen, sen, bre] and Flt(T ) = [f1 , f3 , f2 , f3 ]. In a set-oriented approach, the diagnosis of T would be the set {f1 , f2 , f3 }, where neither the temporal ordering of faults nor the double occurrence of f3 is contemplated.
3 Explanation by Temporal Faults The essential goal in diagnosing a DES is generating the set of candidates relevant to a temporal observation O. Here, in contrast with the classical set-oriented approach, a candidate is a temporal fault that is produced by a trajectory that is consistent with O. The (possibly infinite) set of candidates is the explanation of O. Definition 2 (Explanation) Let O = [o1 , . . . , on ] be a temporal observation of X and T a trajectory of X such that Obs(T ) = O. Let T[i] , i ∈ [0 .. n], denote either T , if i = n, or the prefix of T up to the transition preceding the (i + 1)-th observable transition in T , if 0 ≤ i < n. The explanation of O is a sequence E(O) = [F0 , F1 , . . . , Fn ],
(4)
where each Fi , i ∈ [0 .. n], is the minimal set of temporal faults defined as follows: T ∈ X ∗ , Obs(T ) = O, ∀i ∈ [0 .. n] Fi ⊇ Flt T[i] .
(5)
68
N. Bertoglio et al.
Example 2 Let O = [bre, sen, sen] be a temporal observation of the DES P defined in Example 1. Based on the space P ∗ and the mapping table μ(P) displayed in Fig. 2, the language of the trajectories generating O is s3 b5 s1 b3 (s4 b3 )∗ s2 ; hence, E(O) = [f1 , f1 , f1 f3 (f2 f3 )∗ , f1 f3 (f2 f3 )∗ ]. Each of the four sets of temporal faults in E(O) is represented by a regular expression.1 Thus, f1 f3 (f2 f3 )∗ indicates that the sensor commands the breaker to close instead of opening (f1 ), still the breaker remains closed instead of opening (f3 ), and possibly an unbounded sequence of pairs of faults f2 f3 occurs, namely, the sensor commands the breaker to open instead of closing, still the breaker remains closed. In a classical set-oriented perspective, the set of candidates corresponding to the fourth regular expression (which equals the third one) would be {{f1 , f3 }, {f1 , f2 , f3 }}, in which case the diagnostician knows that both fault f1 and f3 have certainly occurred, whereas the occurrence of fault f2 is uncertain; he or she has no hint about how many times such faults have manifested themselves and in which temporal order. If, instead, the regular expression is given, the diagnostician knows not only that both faults f1 and f3 have certainly occurred, but also that f1 has occurred just once and it was the first, while f3 was the second. In addition, he or she knows that, if f2 has occurred, it has occurred after them, and it may have occurred several times, every time being followed by f3 . All these details may be essential to understand what has happened inside the system in order to make decisions, such as recovery actions. It can be proven that the sets of temporal faults in Eq. (4) are always regular languages (as is in Example 2). This allows a possibly infinite set of temporal faults to be always represented by a (finite) regular expression.
4 Explainer Compilation To support the generation of the explanation of any temporal observation, a data structure called an explainer is exploited. The explainer of a DES X is an NFA resulting from the (offline) compilation of X . The alphabet of the explainer is a set of triples (o, L, f ), where o is an observation of X , L is a language on the faults of X , and f is a (possibly empty) fault. Roughly, each state of the explainer (namely, a fault space) embodies a sort of local explanation defined by languages (regular expressions) on faults. Definition 3 (Fault space) Let X ∗ be the space of a DES X having mapping table μ(X ), F the set of faults of X , and x¯ a state in X . The fault space of x¯ is an NFA regular expression is defined inductively over an alphabet Σ as follows. The empty symbol ε is a regular expression. If a ∈ Σ, then a is a regular expression. If x and y are regular expressions, then the followings are regular expressions: (x) (parentheses may be used), x | y (alternative), x y (concatenation), x? (optionality), x ∗ (repetition zero or more times), and x + (repetition one or more times). When parentheses are missing, the concatenation has precedence over the alternative, while optionality and repetition have highest precedence; for example, ab∗ | cd? denotes (a(b)∗ ) | c(d)?. 1A
Explanatory Monitoring of Discrete-Event Systems
Algorithm 1 Decoration 1: procedure Decoration(X¯ ∗ ) 2: input X¯ ∗ : the unobservable subspace of X ∗ rooted in x¯ 3: side effects: X¯ ∗ becomes the fault space Xx¯∗ 4: N ← X¯ ∗ 5: Substitute each transition x, t, x in N with x, f, x , where (t, o, f ) ∈ μ(X ) 6: Insert into N a new initial state α and an ε-transition α, ε, x
¯ 7: Insert into N a final state ω 8: for all state x in N , x = α, x = ω do 9: Insert an ε-transition x, ε, ω
10: end for 11: while N includes a state that is neither α nor ω or there are several transitions α, r, ω
12: where r is marked with the same state do 13: if there is a list [ x, r1 , x1 , x1 , r2 , x2 , . . ., xk−1 , rk , x ] of transitions, k ≥ 2, where 14: each xi , i ∈ [1 .. (k − 1)], is neither entered nor exited by any other transition then 15: Substitute the transition x, (r1 r2 · · · rk ), x for that list 16: if x = ω then 17: if rk is marked with (x) ˆ then 18: Unmark rk and mark (r1 r2 · · · rk ) with (x) ˆ 19: else 20: Mark (r1 r2 · · · rk ) with (xk−1 ) 21: end if 22: end if 23: else if there is a set { x, r1 , x , x, r2 , x , . . . , x, rk , x } of parallel transitions, 24: from x to x , that are either unmarked or are marked with the same state then 25: Substitute the transition x, (r1 | · · · |rk ), x for that set 26: Unmark r1 , . . . , rk and mark (r1 | · · · |rk ) with the (common) state marking r1 , . . . , rk 27: else 28: Let x be a state of N where x = α and x = ω 29: for all transition x , r , x entering x, x = x do 30: for all transition x, r , x exiting x, x = x do 31: if there is an auto-transition x, r, x for x then 32: Insert a transition x , (r (r )∗ r ), x into N 33: else 34: Insert a transition x , (r r ), x into N 35: end if 36: if x = ω in the newly inserted transition then 37: Mark the relevant regular expression with (x) 38: end if 39: end for 40: end for 41: Remove x and all its entering/exiting transitions 42: end if 43: end while 44: for all transition α, r(x) , ω in N do 45: Mark the state x in X¯ ∗ with the regular expression r 46: end for 47: end procedure
69
70
N. Bertoglio et al.
Xx¯∗ = (Σ, X, τ, x0 ) ,
(6)
where Σ = F ∪ {ε} is the alphabet, X is the subset of states of X ∗ that are reachable by unobservable transitions, x0 = x¯ is the initial state, and τ : X × Σ → 2 X is the transition function, where x1 , f, x2 is an arc in τ iff x1 , t, x2 is a transition in X ∗ and (t, o, f ) ∈ μ(X ). Each state x ∈ X is marked with the language of the temporal fault segments of the trajectory segments in X ∗ from x¯ to x, denoted L(x). What we need is a technique allowing for the automatic decoration of the states within a fault space. To this end, we have exploited and adapted the algorithm proposed in [8] in the context of sequential circuit state diagrams. Essentially, this algorithm takes as input an NFA and generates the regular expression of the language accepted by this NFA. However, in a fault space, all states need to be marked with the relevant regular expressions. Thus, we have extended the algorithm to decorate all the states in one processing of the NFA. The pseudocode of this algorithm, called Decoration, is listed in Algorithm 1. It takes X¯ ∗ , the unobservable subspace of X ∗ rooted in x, ¯ and marks each state x in X¯ ∗ with a regular expression defining the set of temporal fault segments of the trajectory segments connecting x¯ with x, thereby yielding the fault space Xx¯∗ . Example 3 Outlined in Fig. 3 is the genesis of the fault space P1∗ by the Decoration algorithm. Shaded at the left of the figure is the unobservable subspace of P ∗ rooted in state 1. Next to it is the result of the substitutions of the symbols marking the transitions performed in line 5. Shown in third position is the NFA obtained by the insertion of the initial state α, the final state ω, and the relevant ε-transitions (lines 6– 10). The NFA in fourth position is obtained by the removal of state 1 and relevant transitions based on lines 28–41. Likewise, the NFA in fifth position corresponds to the removal of state 4. The DFA in sixth position is obtained by merging the two parallel transitions marked by (1) into a single transition (lines 23–26). Eventually, shaded on the right of the figure is the DFA (in fact, the fault space P1∗ ) obtained by applying the processing in lines 44–46, where states 1 and 4 are marked with the relevant regular expression. Note that the regular expression (f3 f2 )∗ associated with state 1 is equivalent to f3 (f2 f3 )∗ f2 | ε.
Fig. 3 Genesis of the fault space P1∗ by the Decoration algorithm
Explanatory Monitoring of Discrete-Event Systems
71
Fig. 4 Explainer of the DES P , namely, P E
Definition 4 (Explainer) Let X ∗ = (Σ, X, τ, x0 ) be the space of X , O the set of observations of X , F the set of faults of X , and L the set of regular languages on F ∪ {ε}. The explainer of X is an NFA X E = Σ , X , τ , x0 ,
(7)
where Σ ⊆ O × L × (F ∪ {ε}) is the alphabet, X is the set of states, where each state is a fault space of a state in X ∗ , x0 = Xx∗0 is the initial state, and τ is the (nondeterministic) transition function, τ : (X × X ) × Σ → 2(X ×X ) , where (x1 , x1 ), (o, L(x1 ), f ), (x2 , x2 ) is an arc in τ iff x1 is a state of x1 , x1 , t, x2 ∈ τ , (t, o, f ) ∈ μ(X ), and x2 = Xx∗2 . Example 4 With reference to the DES P, shown in Fig. 4 is the explainer P E , where the states (fault spaces) are renamed 0 · · · 7. Unlike component transitions within fault spaces, which are represented with plain arcs, the transitions between states of P E are depicted as dashed arcs that are marked with relevant triples.
5 Monitoring Trace and Explanation Trace Given an explainer X E , the explanation of a temporal observation O is generated by tracing O on X E , thereby yielding a monitoring trace.
72
N. Bertoglio et al.
Fig. 5 Monitoring trace M([bre, sen]) for the DES P
Definition 5 (Monitoring trace) Let O = [o1 , . . . , on ] be a temporal observation of X and X E = (Σ, X, τ, x0 ) the explainer. The monitoring trace of O is a graph M(O) = (M, A, μ0 ),
(8)
where M = {μ0 , μ1 , . . . , μn } is the set of nodes, A the set of arcs, and μ0 the initial node. Each node μi ∈ M, i ∈ [0 .. n], is a subset of X , with μ0 = {x0 }. Each μi = μ0 contains the states of X E that are reached in X E by the states in μi−1 via a transition marked with a triple where the observation is oi . Each arc exiting a state x is marked with a pair (L, L ), where L and L are languages of temporal fault segments. There is an arc from a state x ∈ μi to a state x ∈ μi+1 , i ∈ [0 .. n − 1], marked with (L, L ), iff there is a transition in X E from a space state in x to a space state in x that is marked with (oi+1 , L, f ), where L is defined as follows. Let R be either ε, when i = 0, or (L1 |L2 | . . . |Lk ), when i = 0, where (L j , Lj ) is the pair marking the jth arc entering x ∈ μi , j ∈ [1 .. k]. Then, L = RL f . Example 5 Let O = [bre, sen] be a temporal observation of the DES P. The corresponding monitoring trace M(O) is outlined in Fig. 5. The monitoring trace of O allows for the distillation of an explanation trace, which turns out to equal the explanation of O (Proposition 1). Definition 6 (Explanation trace) Let M(O) be a monitoring trace with nodes μ0 , μ1 , . . . , μn . The explanation trace of M(O) is a sequence E(M(O)) = [L(μ0 ), L(μ1 ), . . . , L(μn )] ,
(9)
where L(μi ), i ∈ [0 .. n], is a language of temporal faults defined as follows. Let {x1 , . . . , xm } be the set of X E states included in μi . Let x j ∈ μi . Let Lin (x j ) be either ε, when i = 0, or the alternative (L1 |L2 | . . . |Lm ), when i = 0, where (Lk , Lk ), k ∈ [1 .. m], is the pair marking the kth arc entering x j . Let Lout (x j ) be either (L1 |L2 | . . . |Lp ), when i = n, where (Lh , Lh ), h ∈ [1 .. p], is the pair marking the hth arc exiting x j , or (L(x j1 )|L(x j2 )| . . . |L(x jr )), when i = n, where x j1 , . . . , x jr are the space states in x j and L(x jv ), v ∈ [1 .. jr ], is the language marking x jv . Then, L(μi ) is the alternative of pairwise concatenated languages L(μi ) = (Lin (x1 )Lout (x1 ) | . . . | Lin (xm )Lout (xm )) . Example 6 With reference to the monitoring trace M(O) displayed in Fig. 5, with O = [bre, sen], the corresponding explanation trace is
Explanatory Monitoring of Discrete-Event Systems
E(M(O)) = [f1 , (f1 |f1 f5 (f1 f4 )∗ ), (f1 ((f3 f2 )∗ |f3 (f2 f3 )∗ )|f1 f5 (f1 f4 )∗ )] = [f1 , f1 (f5 (f1 f4 )∗ )?, f1 (((f3 f2 )∗ |f3 (f2 f3 )∗ )|f5 (f1 f4 )∗ )].
73
(10)
Proposition 1 Let M(O) be a monitoring trace. The explanation trace of M(O) equals the explanation of O, namely, E(M(O)) = E(O).
6 Explanatory Monitoring The notions introduced so far, including temporal fault, explanation, monitoring trace, and explanation trace, all refer to a temporal observation O, which is composed of a sequence of observations, namely, [o1 , . . . , on ]. However, the DES being monitored generates one observation at a time, rather than in one shot. Therefore, the explanation engine (that is, the software system required to explain the temporal observation) is expected to react to each new observation o by updating the current explanation based on o. Albeit the temporal observation grows monotonically, by simple extension of the sequence of observations, the growing of the explanation is in general nonmonotonic. Specifically, if [F0 , F1 , . . . , Fi ] is the explanation of the temporal observation [o1 , . . . , oi ], the explanation of [o1 , . . . , oi , oi+1 ] is not in general [F0 , F1 , . . . , Fi , Fi+1 ], but rather, in the worst case, [F0 , F1 , . . . , Fi , Fi+1 ], where
Algorithm 2 Explanation Engine 1: procedure Explanation Engine(X E , O, M, E , o) 2: input X E : the explainer of X 3: O: a temporal observation of X 4: M: the monitoring trace of O 5: E : the explanation trace of M 6: o: a newly-received observation of X 7: side effects: O, M, and E are updated based on o 8: begin 9: Extend O by the new observation o 10: Let μ denote the last node of M (before the extension) 11: Extend M by a new node μ based on the new observation o 12: Extend E by L(μ ) based on Definition 6 13: X ← the set of states in μ that are not exited by any arc 14: if X = ∅ then 15: repeat 16: Delete from M all states of μ in X and their entering arcs 17: Update L(μ) in E based on Definition 6 18: μ ← the node preceding μ in M 19: X ← the set of states in μ that are not exited by any arc 20: until X = ∅ 21: end if 22: Update L(μ) in E based on Definition 6 23: end procedure
74
N. Bertoglio et al.
F j ⊆ F j , j ∈ [0 .. i]. But, what is the cause of this nonmonotonicity? Intuitively, the extension of O by a new observation o is bound to make some trajectories that were consistent with O, no longer consistent with O = O ∪ [o]. Hence, according to Eq. (5) in Definition 2, only the trajectories T where Obs(T ) = O are considered in defining each language Fi within E(O ). The trajectories that cannot be extended based on o need to be discarded, as well their temporal faults, thereby causing the possible restriction of the languages Fi . The pseudocode of the Explanation Engine is listed in Algorithm 2. It takes as input the explainer X E , a temporal observation O of X , the monitoring trace M of O, the explanation trace E of M, and a new observation o. It updates O, M, and E, thereby providing the explanation of O extended with o. First, O, M, and E are extended in lines 9–12. Then, the set X of states in μ (the last node of M before the extension) that are not exited by any arc is computed (line 13). If X is not empty, a backward pruning of M and E is carried out in lines 15–20. To this end, each state x ∈ X is removed from μ, since x terminates the trajectories that turn out to be inconsistent with the new observation o. Hence, the arcs entering these states are removed from M also (line 16), and L(μ) in E is updated (line 17). Then, the pruning is propagated to the preceding node (lines 18–19) until a node including no state to be removed is found (X = ∅, line 20). Eventually, L(μ) is updated in any case (line 22), since some arcs exiting this node may have been cut. Example 7 Shown in Table 1 is the output of the engine for O = [bre, sen, sen] of the DES P (cf Example 2). For each index i ∈ [0 .. 3] of O, the table shows the monitoring trace M(O[i] ), and the explanation E(O[i] ). When no observation has been generated yet (i = 0), we have E(O[i] ) = [f1 ?], which indicates that the fault f1 may or may not have occurred. After the reception of the third observation (sen), backward pruning is applied to M(O) since no transition marked with a triple Table 1 Incremental output of the explanation engine for O = [bre, sen, sen] of the DES P
Explanatory Monitoring of Discrete-Event Systems
75
involving the observation sen exits the state 6 of the explainer P E . As expected, the eventual explanation E(O) equals the explanation determined in Example 2 based on Definition 2.
7 Conclusion In the literature about model-based diagnosis of DESs, there is a conceptual mismatch between the input of the task, which is a temporally ordered sequence of observed events (called a temporal observation), and the output, which is a set of temporally unrelated faults. To fill this mismatch, the present paper has proposed the notion of a temporal fault, this being a (possibly infinite) sequence of temporally ordered faults. Thus, the diagnosis output is a regular expression representing the (possibly infinite) language of the temporal faults relevant to the trajectories that generate the given temporal observation. In diagnosis during monitoring, the explainability of this output can be further enhanced by performing a backward pruning every time a new observation is perceived. This technique guarantees that all the candidates relevant to any prefix of the temporal observation received so far are consistent with the whole temporal observation received so far. If this technique is applied, the diagnosis task is called explanatory monitoring. Backward pruning takes inspiration from [3, 5]; however, the quoted works adopt a set-oriented perspective. The diagnoser approach [20], besides being set-oriented, does not cope with a possibly infinite number of possibly infinite trajectories that entail the given temporal observation. In fact, the diagnoser approach assumes that both the language of the transitions and the language of the observable events of the DES are live and that the faulty transitions are unobservable, whereas such assumptions are relaxed here. Consequently, while according to the diagnoser approach there does not exist any unobservable behavioral cycle, and hence, there does not exist any cycle of faults, both such cycles are allowed in the current approach. Therefore, in this paper, the regular expressions relevant to the occurrence of faults can represent an unbounded number of iterations, while this is not needed in case the temporal fault characterization was adopted by the diagnoser approach (or by approaches making the same assumptions). Fault detection in DESs was generalized [12] to the recognition of a pattern, this being a DFA that can represent the ordered occurrences of multiple faults, the multiple occurrences of the same fault, etc. It is tempting to speculate that a diagnosis output consisting in a set of temporal faults resembles diagnosis with supervision patterns as a pattern enables the detection of a specific language of transitions and, therefore, the detection of a specific language of faulty transitions also. In other words, the supervision pattern approach can find out whether there exists a trajectory that both implies the given temporal observation and complies with the given (pattern) language. Notice that there may exist several other trajectories that imply the temporal observation while producing sequences of faults that do not belong to the given (pattern) language: the supervision pattern approach does not produce any output about them. The approach described in this paper is not given any automaton upfront recognizing a language;
76
N. Bertoglio et al.
instead, it produces a regular expression representing the language of the faults of all the trajectories that imply the given sequence of observations. Moreover, the output of the supervision pattern approach clarifies whether the pattern has occurred; however, it does not compute the number of its occurrences, nor does it show the reciprocal order of these occurrences and those of individual faults within the trajectories implying the temporal observation. In the view of the current paper, instead, if a fault is associated with a pattern, this can be part of a temporal fault as all other faults are. In other words, temporal faults are orthogonal to the classification of faults, being “simple” or somehow “complex” as in [12, 14, 16]. Finally, one may argue that the generation of an explainer is impractical owing to the exponential space complexity in the number of components and links. Still, similar to the Sampath’s diagnoser, the explainer retains legitimacy as a theoretical reference framework for the definition of temporal fault diagnosis. Future work will focus on the incremental knowledge compilation of the DES [3–6], so as to initially produce a partial (and practical) explainer, to be subsequently extended based on the actual diagnosis problems being coped with. Acknowledgements This work was supported in part by Regione Lombardia (project Smart4CPPS, Linea Accordi per Ricerca, Sviluppo e Innovazione, POR-FESR 2014–2020 Asse I) and by the National Natural Science Foundation of China (grant number 61972360).
References 1. Baroni, P., Lamperti, G., Pogliano, P., Zanella, M.: Diagnosis of large active systems. Artif. Intell. 110(1), 135–183 (1999). https://doi.org/10.1016/S0004-3702(99)00019-3 2. Basile, F.: Overview of fault diagnosis methods based on Petri net models. In: Proceedings of the 2014 European Control Conference, ECC 2014, pp. 2636–2642 (2014). https://doi.org/10. 1109/ECC.2014.6862631 3. Bertoglio, N., Lamperti, G., Zanella, M.: Temporal diagnosis of discrete-event systems with dual knowledge compilation. In: Holzinger, A., Kieseberg, P., Weippl, E., Tjoa, A.M. (eds.) Machine Learning and Knowledge Extraction, Lecture Notes in Computer Science, vol. 11713, pp. 333–352. Springer, Berlin (2019). https://doi.org/10.1007/978-3-030-29726-8_21 4. Bertoglio, N., Lamperti, G., Zanella, M.: Intelligent diagnosis of discrete-event systems with preprocessing of critical scenarios. In: Czarnowski, I., Howlett, R., Jain, L. (eds.) Intelligent Decision Technologies 2019, Smart Innovation, Systems and Technologies, vol. 142, pp. 109– 121. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-8311-3_10 5. Bertoglio, N., Lamperti, G., Zanella, M., Zhao, X.: Twin-engined diagnosis of discrete-event systems. Eng. Reports 1, 1–20 (2019). https://doi.org/10.1002/eng2.12060 6. Bertoglio, N., Lamperti, G., Zanella, M., Zhao, X.: Escaping diagnosability and entering uncertainty in temporal diagnosis of discrete-event systems. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) Intelligent Systems and Applications, Advances in Intelligent Systems and Computing, vol. 1038, pp. 835–852. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-29513-4_62 7. Brand, D., Zafiropulo, P.: On communicating finite-state machines. J. ACM 30(2), 323–342 (1983). https://doi.org/10.1145/322374.322380 8. Brzozowski, J., McCluskey, E.: Signal flow graph techniques for sequential circuit state diagrams. IEEE Trans. Electron. Comput. EC-12(2), 67–76 (1963) 9. Cassandras, C., Lafortune, S.: Introduction to Discrete Event Systems, 2nd edn. Springer, New York (2008)
Explanatory Monitoring of Discrete-Event Systems
77
10. Cong, X., Fanti, M., Mangini, A., Li, Z.: Decentralized diagnosis by Petri nets and integer linear programming. IEEE Trans. Syst. Man Cybern.: Syst. 48(10), 1689–1700 (2018) 11. Hamscher, W., Console, L., de Kleer, J. (eds.): Readings in Model-Based Diagnosis. Morgan Kaufmann, San Mateo, CA (1992) 12. Jéron, T., Marchand, H., Pinchinat, S., Cordier, M.: Supervision patterns in discrete event systems diagnosis. In: Workshop on Discrete Event Systems (WODES 2006), pp. 262–268. IEEE Computer Society, Ann Arbor, MI (2006) 13. Lamperti, G., Zanella, M.: Diagnosis of discrete-event systems from uncertain temporal observations. Artif. Intell. 137(1–2), 91–163 (2002). https://doi.org/10.1016/S00043702(02)00123-6 14. Lamperti, G., Zanella, M.: Context-sensitive diagnosis of discrete-event systems. In: Walsh, T. (ed.) Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI 2011), vol. 2, pp. 969–975. AAAI Press, Barcelona, Spain (2011) 15. Lamperti, G., Zanella, M., Zhao, X.: Introduction to Diagnosis of Active Systems. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92733-6 16. Lamperti, G., Zhao, X.: Diagnosis of active systems by semantic patterns. IEEE Trans. Syst. Man Cybern.: Syst. 44(8), 1028–1043 (2014). https://doi.org/10.1109/TSMC.2013.2296277 17. McIlraith, S.: Explanatory diagnosis: conjecturing actions to explain observations. In: Sixth International Conference on Principles of Knowledge Representation and Reasoning (KR 1998), pp. 167–177. Morgan Kaufmann, S. Francisco, CA, Trento, I (1998) 18. Pencolé, Y., Steinbauer, G., Mühlbacher, C., Travé-Massuyès, L.: Diagnosing discrete event systems using nominal models only. In: 28th International Workshop on Principles of Diagnosis (DX 2017), pp. 169–183. Brescia, Italy (2017) 19. Reiter, R.: A theory of diagnosis from first principles. Artif. Intell. 32(1), 57–95 (1987) 20. Sampath, M., Sengupta, R., Lafortune, S., Sinnamohideen, K., Teneketzis, D.: Diagnosability of discrete-event systems. IEEE Trans. Autom. Control 40(9), 1555–1575 (1995) 21. Sampath, M., Sengupta, R., Lafortune, S., Sinnamohideen, K., Teneketzis, D.: Failure diagnosis using discrete-event models. IEEE Trans. Control Syst. Technol. 4(2), 105–124 (1996)
Artificial Intelligence Technique in Crop Disease Forecasting: A Case Study on Potato Late Blight Prediction Gianni Fenu and Francesca Maridina Malloci
Abstract Crop diseases are strongly affected by weather and environmental factors. Weather fluctuations represent the main factors that lead to potential economic losses. The integration of forecasting models based on weather data can provide a framework for agricultural decision-making able to suggest key information for overcoming these problems. In the present work, we propose a new artificial intelligence approach to forecast potato late blight disease in the Sardinia region and a novel technique to express a crop disease risk. The experiments conducted are based on historical weather data as temperature, humidity, rainfall, speed wind, and solar radiation collected from several locations over 4 year (2016–2019). The tests were aimed to determine the usefulness of the support vector machine classifier to identify crop– weather–disease relations for potato crops and the relative possible outbreak. The results obtained show that temperature, humidity, and speed wind play a key role in the prediction.
1 Introduction The continuous climate changes combined with excessive demographic pressure and unsustainable agricultural practices have exposed the ecosystems to the risk of a progressive deterioration of their production capacity [1]. A proper agriculture monitoring is required to minimize the human footprint and to guarantee food security, environmental sustainability, and land protection. Weather and environmental variables represent the main factors that affect crop growth and relative quality and yield. Decision support systems that integrate forecasting models based on weather data can elaborate useful information to plan a timely disease management practice. G. Fenu (B) · F. M. Malloci Department of Mathematics and Computer Science, University of Cagliari, Cagliari, Italy e-mail: [email protected] F. M. Malloci e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_7
79
80
G. Fenu and F. M. Malloci
In this way, the cost production and crop losses may be reduced by optimizing the timing and the frequency of application control measures. The potato crop is one of the most important food crops in the Sardinia region. Late blight potato disease caused by the oomycete Phytophthora infestans is one of the major crop diseases that cause massive yield losses. The disease management of late blight is quite complex because it involves several factors. From these factors, in the literature it is known that potato late blight is affected by three components known as disease triangle which are a susceptible host, a pathogen, and a favorable environment [2]. Previous studies attempted to forecast the late blight outbreak analyzing satellite images [3–5] or weather data. However, the limitation of forecasting models based on image processing is that they can only be utilized when phenotypic symptoms and characteristics emerge, thus such types of systems or models are unable to assist farmers in treating diseases at an early stage [6]. Most of the contributions in the literature are given thought to the use of mathematical models [7]. Recently, thanks to the advent of precision farming, we can collect a huge amount of data from sensors in the fields. These data represent a new opportunity for crop variability analysis. Besides, the heterogeneous data introduce the problem of scalability, integration, data model, and visualization support [8]. Data integration from different sources and several types of format as unstructured, semi-structured, and structured represent a challenge in big data analytics. It is a challenge that requires a specific data model and an architecture able to process and analyze efficiently valuable information in real time, which needs to be displayed appropriately. To this end, the researchers are now investigating novel artificial intelligence and machine learning approaches to address the problem. Related studies have been conducted by means of Artificial Neural Network (ANN) [9, 10], Support Vector Machine (SVM) [11], and SVM Regression and Logistic Regression [12]. However, the resolution of these case studies requires the availability of weather data, satellite images, and monitoring fields in which the studied methods can be validated. As for meteorological data, the construction of predictive models often requires the detection of meteorological variables with punctual intervals of at least 1 hour. Furthermore, experiments and results are strongly influenced by local weather factors, and therefore the models must be trained with a dataset containing detections belonging to the monitoring area. At present, there are more possibilities to find meteorological data for American nations, but it is not equally easy to have meteorological data for Italy, especially for the Sardinia region. To overcome these problems, we have partnered with the LAORE Sardinia Agency and the Regional Agency for the Protection of the Sardinian Environment (ARPAS). LAORE Sardinia Agency deals with providing advisory, education, training, and assistance services in the regional agricultural sector. This collaboration allows us to work closely with agricultural technicians and operators and to take advantage of monitoring fields in which we can validate our studies. Similarly, thanks to the collaboration with the ARPAS Agency, we can access the 50 weather stations scattered throughout the region and train our models with 4-year historical data (2016–2019).
Artificial Intelligence Technique in Crop Disease Forecasting …
81
Nevertheless, the best of our knowledge, there is no report in the literature that tries to identify crop–weather–disease relations for potato crops and the relative possible outbreak through the artificial intelligence approach. A significant contribution has been presented in [13] for rice blast prediction. To forecast potato late blight in Sardinia, we developed the LAORE Architecture Network Development for Sardinia (LANDS) Decision Support System [14] and we performed a test analyzing weather data collected by ARPAS from 50 locations over 4 year (2016–2019) [15, 16]. The DSS is designed to help LAORE technicians and Sardinian farmers in decision-making and not to replace the decision-maker. The main purposes are to (i) optimize the resources management through reduction of certain inputs (e.g., chemicals and naturals resources, etc.), (ii) predict crop risk situations (e.g., diseases, weather alerts, etc.), (iii) increase the quality of decisions for field management, and (iv) reduce environmental impact and production cost. The structure of the present work is as follows. Section 2 describes the dataset and the techniques employed to solve the analyzed problem. Section 3 illustrates the experimental strategy and results, while Sect. 4 presents a discussion of our main findings and future lines of research.
2 The Proposed Method The present paper introduces a forecasting model based on artificial intelligence technique as Support Vector Machine (SVM) originally developed by Vapnik and co-workers at Bell Laboratories [17]. Over the past years in the literature, due to its ability to learn without being strictly programmed in general-purpose supervised predictions, it has been adopted with good results in several sectors [6, 18]. Simultaneously, we employed it to understand the relationship between potato late blight disease severity and its associated environmental conditions. To the best of our knowledge, there is no report on using support vector machine for the present purpose. A similar work illustrated in [13] has been performed only for rice blast prediction. Hence, the present case study, as in the experiment conducted in [13], was aimed to determine the usefulness of the SVM model to predict potato late blight index based on weather parameters. More specifically, our contributions are the following: • We provide an artificial intelligence approach that recommends to the farmers the potato late blight outbreak in the Sardinian region considering weather data registered by regional ARPAS weather stations. • We present, for the first time in the literature, a new technique to label the dataset and express the risk index of crop disease.
82
G. Fenu and F. M. Malloci
• We verified our proposal on a real-world dataset made up of approximately 4 years of data from 50 locations and evaluated the classifier with standard accuracy metrics. • Our solution can be embedded not only in the LANDS Decision Support System developed but also in other Decision Support Systems, thus finding practical and effective applications.
2.1 Dataset This research work is based on data collected by means of the regional weather stations belonging to the Regional Agency for the Protection of the Sardinian Environment (ARPAS). The historical weather data cover a period of 4 years from 2016 to 2019. Seeing as the object was to suggest to the farmer a recommendation for the entire growing season, we considered the period from May to September. Thus, the dataset resulted contains 36034 instances, which come from 50 different stations. Each instance is represented by the following variables: • • • • • •
daily average temperature (C°); daily average humidity (%); daily average rainfall (mm); daily speed wind (km/h); daily solar radiation (W/m2); daily risk index.
The daily risk index represents the label to predict. It expresses the disease severity and it is encoded into four classes through a range from 0 to 3 that identify, respectively, no risk, low risk, medium risk, and high risk. Labels ratings are distributed as described in Fig. 1, where “count” indicates the number of samples having the corresponding rating. To the classes 0,1,2,3 belong 20065, 2186, 1925, 1363 instances, respectively. The labels are calculated through SimCast model originally developed by Fry [19, Fig. 1 Label ratings distribution
Artificial Intelligence Technique in Crop Disease Forecasting …
83
20] and co-authors. The proposed labeling method is described in detail in the 2.2 paragraph. From Fig. 1, which shows graphically the distribution of ratings, we can clearly notice that we encountered two main issues as the data imbalance and the small size of the minority classes, which is due to the reason we are dealing with real-weather data which covered a growing season starting from May to September. Intuitively, the class zero which encodes the non-presence of the disease represents the majority class, because it is registered mainly during the beginning of the growing season when the crop has just been planted. This imbalance could also occur even in the analysis of more than 4 years and more monitoring points.
2.2 Labeling Method To label the dataset we used a risk index produced by a mathematical model known in the literature as SimCast, developed by Fry and co-authors. The technique reported in [19] predicts the occurrence of potato late blight for susceptible, moderately susceptible, and resistant cultivar, but only predictions for resistant cultivars were analyzed in this study. The disease stress to the plant is forecasted using Fungicide Units (FUs) or Blight Units (BUs). The FUs are determined considering daily precipitation (mm) and time elapsed since the last application of the product. The BUs indicate if there are favorable conditions for the disease onset on a given date. They are calculated according to the number of consecutive hours that relative humidity is greater than or equal to 90%, and the average temperature falls within any of six ranges (< 3, 3–7, 8–12, 13–22, 23–27, and > 27 C°). In our case, we have considered the BUs approach in order to generate a disease risk index for each instance of the dataset. For this purpose, based on the results obtained in previous experiments [15], in collaboration with LAORE technicians, we have modified the model to local weather conditions. The modified version is described in Table 1.
2.3 Preprocessing One of the most important aspects when dealing with weather data is how the data from sensors are processed and analyzed. Most artificial intelligence classifiers require all features to be complete and a balanced dataset, because the learning phase of classifiers may be biased by the instances that are frequently present in the dataset or present missing values. To deal with missing values, the literature has suggested several approaches. The more effective envisages in removing the noise or in interpolating the data. Different interpolation methods exist. For the sake of simplicity and due to its effectiveness in our data, we employed the linear interpolation. To deal with an imbalanced dataset, the researchers recommend two main
84
G. Fenu and F. M. Malloci
Table 1 The proposed modified version of SimCast model based on local weather conditions Average temperature
Cultivar resistance
C°
Consecutive hours of relative humidity >=90% that should result in blight units of:2 0
1
2
3
7
8–9
10–24
10–12
13–24
>27
MR1
24
23–27
MR
15
13–22
MR
6
8–12
MR
9
3–7
MR
18
19–24
0 and bk > 0. For the lower quality level of end-of-use SEPs, the increase in price will lead to more decrease in Table 1 Notations E
The maximum amount of available SEPs
lk
Amount of available SEPs in quality level k
ck
Customers’ expected reward of end-of-use SEPs in quality level k
Q
Sales volume of the new SEP
P
Price of the new SEP
k
Quality level of returned SEPs
cd
Disposal cost of end-of-use SEPs
ak
Sales volume of end-of-use SEPs in quality level k, when the reselling price is zero in the secondary market
bk
Absolute slope of OEM’s reselling price and sales volume of end-of-use SEPs in quality level k in the secondary market
π
OEM’s profit
rk
Recycling rate of end-of-use SEPs in quality level k
qk
Amount of returned SEPs in quality level k
qk
Amount of sold SEPs in quality level k in the secondary market
pk
Recycling price of end-of-use SEPs in quality level k
pk p
Reselling price of end-of-use SEPs in quality level k in the secondary market
π
OEM’s optimal profit
OEM’s optimal prices for recycling and reselling end-of-use SEPs
Optimal Quality-Based Recycling and Reselling Prices …
95
the customer’s willingness to buy, i.e., bm > . . . > bn . For unsold returned SEPs, OEM deposes of them with the cost cd in a proper way. According to the above problem description, the objective is to maximize OEM’s profit, which includes the selling profit of new SEPs, the reselling profit of returned SEPs in the secondary market, and the recycling and disposal costs, then we have the following program problem (1): maxπ = Q P −
n
pk (1 − e− pk /ck )lk +
k=1
− cd [
n k=1
(1 − e− pk /ck )lk −
n
pk (ak − bk pk )
k=m n
(ak − bk pk )]
(1)
k=m
s.t. (e− pk /ck − 1)lk ≤ 0, k = 1, 2, . . . m, . . . , n,
(2)
bk pk − ak ≤ 0, k = m, . . . , n
(3)
pk−1 − pk ≤ 0, k = 1, 2, . . . m, . . . n
(4)
pk−1 − pk < 0, k = 1, 2, . . . m, . . . n
(5)
pk − pk ≤ 0, k = m, . . . n,
(6)
ck − pk ≤ 0, k = 1, 2, . . . m, . . . , n,
(7)
pk − P < 0, k = 1, 2, . . . m, . . . n
(8)
pk − P < 0, k = m, . . . , n,
(9)
Constraints (2) and (3) ensure that the amount of returned and resold SEPs is non-negative. Constraint (4) indicates that recycling prices of end-of-use SEPs are positive related to their quality levels. Constraint (5) indicates that the higher the quality level of end-of-use SEPs, the higher the recycling price when collecting from customers. Constraint (6) shows that the recycling price is higher than the reselling price in quality level k. Constraint (7) reflects that the recycling price is higher than the customers’ expected reward in quality level k, then OEM can recycle as much SEPs as possible. Constraints (8) and (9) emphasize that the recycling and reselling prices are less than the selling price of the new SEP.
96
S. Zong et al.
3 Algorithm and Numerical Study 3.1 Algorithm Based on the classic genetic algorithm, we propose an algorithm to solve the above program problem (1). Initial settings refer to Goldberg [11], which are shown in the parameters settings. The initialization procedure and the main process are shown in GA initialization and GA processing, respectively. The proposed algorithm is realized by an open-source evolutionary algorithm toolbox and framework for Python, geatpy. Algorithm Parameters Settings 1:
function SET_PARA()
2:
set Q, l1, l2, l3, c1, c2, c3, p, c, a2, b2, a3, b3, cd, N
3:
set ObjV //set objective function
4:
set MAXGEN //maximum number of generations
5:
set selectStyle = Rws //use roulette wheel selection
6:
set recStyle = {Xovdp, Xovud} //double-point or uniform crossover
7:
set pc = 0.7, pm = 0.05 //recombination probability, mutation probability
8:
set mutStyle = {Mutuni, Mutpolyn} //uniform or polynomial mutation
9:
set Chrom //initialize population for each generation
10:
set obj_trace = np.zeros((MAXGEN, 2)) //recorder for objV
11:
set var_trace = np.zeros((MAXGEN, 5)) //recorder for generations
GA Initialization 1:
function INIT_GA()
2:
set Chrom, CV //initialize population and CV matrix
3:
for all i in N do //for all individuals in population
4:
if constraints (1)-(9) are satisfied do //output feasibility vector
5:
FitnV [i] = 1
6:
else
7:
FitnV [i] = 0
8:
Chrom = crtpc(‘RI’, Nind, FieldDR)//crtpc is a population generation tool in geatpy, ‘RI’ refers to an encoding method that can generate continuous or discrete variables, Nind refers to population size, FieldDRis a variable description tool
9:
ObjV, CV = aimfunc(Chrom, np.ones((Nind, 1)))//calculate ObjV for 1st generation
10:
FitnV = ranking(ObjV, CV ) //calculate fitness value
11:
best_ind = np.argmax(FitnV ) //output the best individual in 1st generation
GA Processing 1:
function PROC_GA()
2:
for all i in Ndo//SelCh refers to chrom after certain process (continued)
Optimal Quality-Based Recycling and Reselling Prices …
97
(continued) 3:
SelCh = Chrom[ga.selecting(selectStyle, FitnV, Nind - 1),:] //selection
4:
SelCh = ga.recombin(recStyle, SelCh, pc) //recombination
5:
SelCh = mutuni(‘RI’, SelCh, FieldDR, 1) //mutation
6:
Chrom = np.vstack([Chrom[best_ind,:], SelCh]) //get new generation
7:
Phen = Chrom//Phen refers to a new chrom
8:
ObjV, CV = aimfunc(Phen, CV ) //calculate objective value
9:
FitnV = ranking(ObjV, CV ) //assign fitness value
10:
best_ind = np.argmax(FitnV ) //find the best individual
11:
obj_trace[i, 0] = np.sum(ObjV ) /ObjV.shape[0]//record average value of ObjV
12:
obj_trace[i, 1] = ObjV [best_ind]//record ObjV from the best individual
13:
var_trace[i,:] = Chrom[best_ind,:]//record the best individual
14:
best_gen = np.argmax(obj_trace[:, [1] ])//output the best generation as final result
3.2 Benchmark Example We present a benchmark example based on the above-proposed algorithm, which is executed with Python 3.7 on a personal computer configured with Intel Core i5, macOS. To verify our model and algorithm, a sensors-embedded dryer is considered, there are 200 such end-of-use dryers, in which critical components are heater, blower, and motor. Parameters in the algorithm are set in Table 2 which are referred to Ondemir and Gupta [2]. Generally, the amount of returned SEPs in different quality levels is assumed to satisfy the normal distribution [12], we assume that dryers’ grades are subjected to N(50,130). The simulation is done by Microsoft Excel, the quality of dryers is graded to calculate the amount of them for each quality level, and optimal decisions are shown in Table 3. The results show that the proposed algorithm is feasible to solve the pricing decision model. Different operator combinations have negligible effect on the results, and the running times are all around 2 s. Therefore, it is reasonable to use Set1 = {Rws, Xovdp, Mutuni} in next section. We also Table 2 The parameters of the benchmark example Parameters
Values
Parameters
Values
E=Q
200
ak
a2 = 94.8, a3 = 19 b2 = 5, b3 = 2
k
{1,2,3}
bk
lk
l1 = 50, l2 = 120, l3 = 30
cd
0.23
ck
c1 = 1.00, c2 = 1.50, c3 = 2.05
P
3.00
Note The unit of cd , ck , and P is $100
98
S. Zong et al.
Table 3 Optimal prices for different operator combinations MAXGEN = 100
p∗
Set1 = {Rws, Xovdp, Mutuni} p1b = 1.000000 p2b p3b p2b p3b
= 1.500000 = 2.050000 = 2.998703 = 2.999584
Set2 = {Rws, Xovdp, Mutpolyn} p1b = 1.000000 p2b p3b p2b p3b
= 1.500000 = 2.050000 = 2.999376 = 2.999618
Set3 = {Rws, Xovud, Mutuni}
Set4 = {Rws, Xo-vud, Mutpolyn}
p1b = 1.000000
p1b = 1.000000
= 1.500000
p2b = 1.500000
= 2.050000
p3b = 2.050000
= 2.999351
p2b = 2.998960
= 2.999816
p3b = 2.999784
p2b p3b p2b p3b
π∗
686.318065
686.361142
686.360792
686.335715
Time (s)
2.046999
1.953429
1.865707
1.972274
Note The superscript “b” denotes the benchmark
Table 4 Comparison with related approaches Approach
Genetic algorithm
Sequential quadratic program
Particle swarm optimization
Time (s)
2.046999
2.382000
4.765613
Accuracy
p1b p2b p3b p2b p3b
= 1.000000 = 1.500000 = 2.050000 = 2.998703 = 2.999584
π ∗ = 686.318065
p1b p2b p3b p2b p3b
= 1.000000
p1b = 1.038844
= 1.500000
p2b = 1.551695
= 2.050000
p3b = 2.051369
= 3.000000
p2b = 2.763782
= 3.000000
p3b = 2.971543
π ∗ = 686.403300
π ∗ = 672.844928
provide comparison with related approaches from two aspects of time complexity and accuracy (Table 4). Through the comprehensive comparison among three approaches, it is easy to conclude that the proposed genetic algorithm is the most suitable approach for the model.
3.3 Sensitive Analysis In the section, we provide sensitive analysis from three aspects of the maximum amount of available dryers, customer’s expected rewards, and the sales amount of dryers.
Optimal Quality-Based Recycling and Reselling Prices …
99
Case1: The effect of the maximum amount of available dryers Assume the maximum amount of available dryers is 1.5, 2, 2.5 times that of the benchmark, other parameters are the same in Table 3. Results are shown in Table 5. From Table 5, we find that (1) p2 , p3 increase in E and close to P, since dryers are single produced, consumers switch to the secondary market; (2) p1 = c1 , p2 = c2 , p3 = c3 , which are because pk ≥ ck and OEM prefers the minimum recycling cost; and (3) OEM’s optimal profit increases in E. Case2: The effect of customer’s expected rewards Compared with the benchmark, c1 , c2 , and c3 have variations of 10% and 30%, respectively. The optimal prices and profits are shown in Table 6. From Table 6, we find that (1) p3 increases in c3 ; (2) ck is negatively related to OEM’s profit. When customers’ expected rewards are too high, OEM will suffer losses because of the higher recycling cost. Overall, customers’ expected rewards have great impacts on OEM’s optimal decisions. If consumers are strategic, they will consider the impact of recycling policy issued by the government, and then offer OEM higher expected rewards. Further, OEM’s profit will be worse, and OEM will even consider giving up production. Case3: The effect of the sales amount of dryers Compared with the benchmark, a2 and a3 have variations of 20% and 40%, respectively. ak also reflects the demand on end-of-use dryers in the second market. The results are shown in Table 7. From Table 7, we find that (1) pk increases in ak (k = 2, 3); (2) OEM’s profit increases in the demand of end-of-use dryers. The reason for the first finding is that when the demand increases, it results in increasing of OEM’s reselling price. The second finding is caused by the increase in both the sales amount and OEM’s reselling price in the end of use dryers.
4 Conclusions and Future Works In this paper, we develop a decision model for the optimal quality-based recycling and reselling prices of returned SEPs in IoT environment, which is formulated by a nonlinear programming problem. Based on the proposed algorithm and numerical studies, OEM’s optimal price decisions are obtained. Through the sensitive analysis, we also find the effects of three key parameters on the recycling price, reselling price, and profit. It is worth noting that customer’s expected rewards have great impact on OEM’s profit. If customers are strategic, it is very likely that they will make good use of government regulations, which require OEM to recycle as much EOUPs as possible. Hence, OEM with insufficient production capacity will withdraw from
π∗
= 2.999584
= 2.998703
= 2.050000
= 1.500000
686.318065
p2b p3b p2b p3b 1025.629727
p3 = 2.999469
p2 = 2.998902
1441.838563
p3 = 2.999655
p2 = 2.999489
p3 = 2.050000
p3 = 2.050000
p1 = 1.000000 p2 = 1.500000
p1 = 1.000000
a2 = 190, a3 = 38
l1 = 100, l2 = 240, l3 = 60
400
p2 = 1.500000
ak
p1b = 1.000000
l1 = 75, l2 = 180, l3 = 45 a2 = 113.8, a3 = 28.4
l1 = 50, l2 = 120, l3 = 30
a2 = 94.8, a3 = 19
lk
p∗
300
200 (benchmark)
E
Table 5 Effects of maximum amount on optimal prices and profit
1818.974945
p3 = 2.999783
p2 = 2.999692
p3 = 2.050000
p2 = 1.500000
p1 = 1.000000
a2 = 228, a3 = 56.9
l1 = 125, l2 = 300, l3 = 75
500
100 S. Zong et al.
Optimal Quality-Based Recycling and Reselling Prices …
101
Table 6 Effects of customer’s expected rewards on optimal prices and profit c1
0.7
0.9
1.0 (benchmark)
p∗
p1 = 0.700000
p1 = 0.900000
p1b p2b p3b p2b p3b
p2 = 1.500000
p2 = 1.500000
p3 = 2.050000
p3 = 2.050000
p2 = 2.997940
π∗
p3 = 2.999844
p3 = 2.999428
695.752951
689.511443
c2 p∗
p2 = 2.999234
1.05
1.35
p1 = 1.000000
p1 = 1.000000
p2 = 1.050000
p2 = 1.350000
p3 = 2.050000
p3 = 2.050000
p2 = 2.999072
p2 = 2.999920
= 1.000000 = 1.500000 = 2.050000
1.1
1.3
p1 = 1.100000
p1 = 1.000000
p2 = 1.500000
p2 = 1.500000
p3 = 2.050000
p3 = 2.050000
p2 = 2.999610
= 2.999584
p3 = 2.999973
p3 = 2.999140
683.217717
676.880260
686.318065
1.5 (benchmark) p1b p2b p3b p2b p3b
= 2.998703
p2 = 2.999051
1.65
1.95
= 1.000000
p1 = 1.000000
p1 = 1.000000
= 1.500000
p2 = 1.650000
p2 = 1.950000
p3 = 2.050000
p3 = 2.050000
= 2.050000
= 2.998703
p2 = 2.998228
= 2.999584
p3 = 2.999465
p3 = 2.999859
p2 = 2.998906
p3 = 2.999838
p3 = 2.999925
π∗
720.477693
697.775962
686.318065
674.908853
652.198246
c3
1.435
1.845
2.05 (benchmark)
2.255
2.665
p∗
c3 < c2
p1 = 1.000000
p1b = 1.000000
p2 = 1.050000
p2b p3b p2b p3b
p3 = 1.845000
p2 = 2.999121
p3 = 2.999536 π∗
null
690.231910
= 1.500000 = 2.050000
p1 = 1.000000
p1 = 1.000000
p2 = 1.050000
p2 = 1.050000
p3 = 2.255000
p3 = 2.665000
= 2.998703
p2 = 2.998857
= 2.999584
p3 = 2.999755
p3 = 2.999891
682.441410
674.708551
686.318065
p2 = 2.999506
the product market, and the government should consider subsidizing them. And the work can be improved in several ways. (1) For different scenarios, different recovery options should be set, while only two options are involved in the paper. (2) Since the SEP is lifecycle monitored, the research can be extended to multiple periods, and the demand varies in different periods. (3) Verify the effect of sorting approach and obtain the optimal solution for OEM. (4) Employ more real-world data to verify the effectiveness of our model and algorithm.
102
S. Zong et al.
Table 7 Effects of sales amount on optimal prices and profit a2
56.88
75.84
p∗
p1 = 1.000000
p1 = 1.000000
p1b = 1.000000
p1 = 1.000000
p1 = 1.000000
p2 = 1.500000
p2 = 1.500000
p2 = 1.500000
p3 = 2.050000
= 1.500000
p2 = 1.500000
p3 = 2.050000
p2b p3b p2b p3b
p3 = 2.050000
p3 = 2.050000
p2 = 2.998536
π∗
p2 = 2.998605
p3 = 2.999510
p3 = 2.999563
563.881508
625.097277
94.8 (benchmark)
= 2.050000
113.76
132.72
= 2.998703
p2 = 2.998782
= 2.999584
p3 = 2.999784
p3 = 2.999979
747.542107
808.873325
686.318065
a3
11.4
15.2
22.8
26.6
p∗
p1 = 1.000000
p1 = 1.000000
p1b = 1.000000
p1 = 1.000000
p1 = 1.000000
p2 = 1.500000
p2 = 1.500000
p2 = 1.500000
p2 = 1.500000
p3 = 2.050000
p3 = 2.050000
p2b = 1.500000
p2 = 2.996038
π∗
p2 = 2.997436
p3 = 2.996045
p3 = 2.998441
661.607218
673.961836
19 (benchmark)
p2 = 2.999887
p3b = 2.050000
p3 = 2.050000
p3 = 2.050000
b
p2 = 2.999206
p3 = 2.999312
p3 = 2.999960
698.619691
710.914521
p2 = 2.998703 p3b = 2.999584 686.318065
p2 = 2.999430
References 1. Fang, C., Liu, X., Pardalos, P.M., Pei, J.: Optimization for a three-stage production system in the Internet of Things: procurement, production and product recovery, and acquisition. Int. J. Adv. Manuf. Technol. 83, 689–710 (2016) 2. Ondemir, O., Gupta, S.M.: Quality management in product recovery using the Internet of Things: an optimization approach. Comput. Ind. 65(3), 491–504 (2014) 3. Minner, S., Kiesmüller, G.P.: Dynamic product acquisition in closed loop supply chains. Int. J. Prod. Res. 50(11), 2836–2851 (2012) 4. Subulan, K., Ta¸san, A.S., Baykaso˘glu, A.: Designing an environmentally conscious tire closed-loop supply chain network with multiple recovery options using interactive fuzzy goal programming. Appl. Math. Model. 39(9), 2661–2702 (2015) 5. Govindan, K., Jha, P.C., Garg, K.: Product recovery optimization in closed-loop supply chain to improve sustainability in manufacturing. Int. J. Prod. Res. 54(5), 463–1486 (2016) 6. Masoudipour, E., Amirian, H., Sahraeian, R.: A novel closed-loop supply chain based on the quality of returned products. J. Clean. Prod. 151, 344–355 (2017) 7. Zhang, Y., Liu, S., Liu, Y., Yang, H., Li, M., Huisingh, D., Wang, L.: The ‘Internet of Things’ enabled real-time scheduling for remanufacturing of automobile engines. J. Clean. Prod. 185, 562–575 (2018) 8. Niknejad, A., Petrovic, D.: Optimisation of integrated reverse logistics networks with different product recovery routes. Eur. J. Oper. Res. 238(1), 143–154 (2014) 9. Xiong, Y., Zhao, P., Xiong, Z., Li, G.: The impact of product upgrading on the decision of entrance to a secondary market. Eur. J. Oper. Res. 252(2), 443–454 (2016) 10. Jun, H.B., Shin, J.H., Kim, Y.S., Kiritsis, D., Xirouchakis, P.: A framework for RFID applications in product lifecycle management. Int. J. Comput. Integr. Manuf. 22(7), 595–615 (2009)
Optimal Quality-Based Recycling and Reselling Prices …
103
11. Goldberg, D.E., Holland, J.H.: Genetic algorithms and machine learning. Mach. Learn. 3(7), 95–99 (1988) 12. Radhi, M., Zhang, G.: Optimal configuration of remanufacturing supply network with return quality decision. Int. J. Prod. Res. 54(5), 1487–1502 (2016)
On a Novel Representation of Multiple Textual Documents in a Single Graph Nikolaos Giarelis , Nikos Kanakaris , and Nikos Karacapilidis
Abstract This paper introduces a novel approach to represent multiple documents as a single graph, namely, the graph-of -docs model, together with an associated novel algorithm for text categorization. The proposed approach enables the investigation of the importance of a term into a whole corpus of documents and supports the inclusion of relationship edges between documents, thus enabling the calculation of important metrics as far as documents are concerned. Compared to well-tried existing solutions, our initial experimentations demonstrate a significant improvement of the accuracy of the text categorization process. For the experimentations reported in this paper, we used a well-known dataset containing about 19,000 documents organized in various subjects.
1 Introduction In recent years, we have witnessed an increase in the adoption of graph-based approaches for the representation of textual documents [3, 26]. Generally speaking, graph-based text representations exploit properties inherited from graph theory (e.g., node centrality and subgraph frequency) to overcome the limitations of the classical bag-of -words representation [1]. Specifically, graph-based models (contrary to the bag-of-words ones) are able to (i) capture structural and semantic information of a text, (ii) mitigate the effects of the “curse-of-dimensionality” phenomenon, (iii) identify the most important terms of a text, and (iv) seamlessly incorporate information coming from external knowledge sources. N. Giarelis · N. Kanakaris · N. Karacapilidis (B) Industrial Management and Information Systems Lab, MEAD, University of Patras, 26504 Rio Patras, Greece e-mail: [email protected] N. Giarelis e-mail: [email protected] N. Kanakaris e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_9
105
106
N. Giarelis et al.
However, in cases where a corpus of documents needs to be considered and analyzed, existing graph-based approaches represent each document of the corpus as a single graph. In such cases, the main weaknesses of these approaches are that (i) they are incapable of assessing the importance of a word for the whole set of documents and (ii) they do not allow for representing similarities between these documents. To remedy the above weaknesses, this paper expands the graph-based text representation model proposed by Rousseau et al. [21, 22], i.e., the graph-of -words model, and introduces a novel approach to represent multiple documents as a single graph, namely, the graph-of -docs model, as well as an associated novel algorithm for text categorization. Contrary to existing approaches, the one introduced in this paper (i) enables the investigation of the importance of a term into a whole corpus of documents, (ii) masks the overall complexity by reducing each graph of words to a “document” node, and (iii) supports the inclusion of relationship edges between documents, thus enabling the calculation of important metrics as far as documents are concerned. The proposed approach uses the Neo4j graph database (https://neo4j. com) for the representation of the graph-of-docs model. For the implementation of our experiments, we use the Python programming language and the scikit-learn ML library (https://scikit-learn.org). Compared to well-tried existing solutions, our initial experimental results show a significant improvement of the accuracy of the text categorization process. The remainder of the paper is organized as follows. Section 2 describes the graphof-words representation and its application to classical NLP tasks. Our approach, i.e., graph of docs, is analytically presented in Sect. 3. Section 4 reports on the experiments carried out to evaluate the proposed approach. Finally, limitations of our approach, future work directions, and concluding remarks are outlined in Sect. 5.
2 Background Work 2.1 Graph of Words The graph-of-words textual representation is similar to the bag-of-words representation that is widely used in the NLP field. It enables a more sophisticated keyword extraction and feature engineering process. In a graph of words, each node represents a unique term (i.e., word) of a document and each edge represents the co-occurrence between two terms within a sliding window of text. Nikolentzos et al. [16] propose the utilization of a small sliding window size, due to the fact that the larger ones produce heavily interconnected graphs where the valuable information is cluttered with noise; Rousseau et al. [21] suggest that a window size of four is generally considered to be the appropriate value, since it does not sacrifice either the performance or the accuracy of their approach.
On a Novel Representation of Multiple Textual Documents …
107
2.2 Graph-Based Keyword Extraction A set of existing approaches performing classical NLP tasks builds on the graph-ofwords textual representation. Ohsawa et al. [18] were the first that use the graphof-words representation in the keyword extraction and text summarization tasks. Their approach segments a graph of words into clusters aiming to identify frequent co-occurred terms. Adopting a similar research direction, the TextRank model implements a graph-based ranking measure to find the most prestigious nodes of a graph (i.e., the nodes with the highest indegree value) and utilizes them to the tasks of keyword and sentence extraction [12]. The utilization of node centrality measures to the keyword and key-phrase extraction tasks can also be found in the literature [4]; these measures include the “degree” centrality, the “closeness” centrality, the “betweenness” centrality, and the “eigenvector” centrality. Bougouin et al. [5] propose a novel graph-based unsupervised topic extraction method, namely, TopicRank. TopicRank clusters key phrases into topics and identifies the most representative ones using a graph-based ranking measure (e.g., TextRank). Finally, Tixier et al. [27] focus on the task of unsupervised singledocument keyword extraction, arguing that the most important keywords correspond to the nodes of the k-core subgraph [24].
2.3 Graph-Based Text Categorization As far as graph-based text categorization is concerned, several interesting approaches have been already proposed in the literature. Depending on their methodology, we can classify them into two basic categories: (i) approaches that employ frequent subgraph mining techniques for feature extraction and (ii) approaches that rely on graph kernels. Popular frequent subgraph mining techniques include gSpan [29], Gaston [15], and gBoost [23]. Rousseau et al. [21] propose various combinations and configurations of these techniques, ranging from unsupervised feature mining using gSpan to unsupervised feature selection exploiting the k-core subgraph. In particular, aiming to increase performance, they rely on the concept of k-core subgraph to reduce the graph representation to its densest part. The experimental results show a significant increment of the accuracy compared to common classification approaches. Graph kernel algorithms contribute significantly to recent approaches for graphbased text categorization [17]. A graph kernel is a measure that calculates the similarity between two graphs. For instance, a document similarity algorithm based on shortest path graph kernels has been proposed in [17]; this algorithm can be used as a distance metric for common ML classifiers such as SVM and k-NN. The experimental results show that classifiers that are based on graph kernel algorithms outperform several classical approaches. It is noted that the GraKeL Python library collects and unifies widely used graph kernel libraries into a single framework [25], providing an
108
N. Giarelis et al.
easily understandable interface (similar to that of scikit-learn) that enables the user to develop new graph kernels.
2.4 Graph Databases Compared to conventional relational databases, graph databases provide a more convenient and efficient way to natively represent and store highly interlinked data. In addition, they allow the retrieval of multiple relationships and entities with a single operation, avoiding the utilization of rigid joint operations which are heavily used in relational databases [13]. An in-depth review of graph databases appears in [20]. Our approach builds on top of the Neo4j graph database (https://neo4j.com), a broadly adopted graph database system that uses the highly expressive Cypher Graph Query Language to query and manage data.
3 Our Approach: Graph of Docs In this paper, we expand the “graph-of-words” model proposed by Rousseau et al. [22] to introduce a “graph-of-docs” model. Contrary to the former model, where a graph corresponds to a single document, the proposed model represents multiple documents in a single graph. Our approach allows diverse types of nodes and edges to co-exist in a graph, ranging from types of nodes such as “document” and “word” to types of edges such as “is_similar,” “connects,” and “includes” (see Fig. 1). This enables us to investigate the importance of a term not only within a single document but also within a whole corpus of documents. Furthermore, the proposed graph-of-docs representation adds an abstraction layer by assigning each graph of words to a document node. Finally, it supports relationship edges between documents, thus enabling the calculation of important metrics as far as the documents are
Fig. 1 The schema of the graph-of-docs representation model
On a Novel Representation of Multiple Textual Documents …
109
concerned (e.g., identifying cliques or neighborhoods of similar documents, identifying important documents, generating communities of documents without any prior knowledge, etc.). The graph-of-docs representation produces a directed dense graph that contains all the connections between the documents and the words of a corpus (see Fig. 2). Each unique word node is connected to all the document nodes where it belongs to using edges of the “includes” type; edges of “connects” type are only applicable between two word nodes and denote their co-occurrence within a specific sliding text window; finally, edges of the “is_similar” type link a pair of document nodes and indicate their contextual similarity. The above transformation of a set of documents into a graph assists in the reduction of diverse NLP problems to problems that have been well studied through graph theory techniques [21]. Such techniques explore important characteristics of a graph, such as node centrality and frequent subgraphs, which in turn are applied to identify meaningful keywords and find similar documents. We argue that the accuracy of common NLP and text mining tasks can be improved by adopting the proposed graph-of-docs representation. Below, we describe how three
Fig. 2 The graph-of-docs representation model (relationships between documents are denoted with dotted lines)
110
N. Giarelis et al.
key NLP tasks (namely, “Keyword Extraction,” “Document Similarity,” and “Text Categorization”) can be carried out using our approach.
3.1 Keyword Extraction To extract the most representative keywords from each document, we apply centrality measures (an in-depth review of them appears in [10]). In general, these measures identify the most influential nodes of a graph, i.e., those that usually have an indegree score higher than a predefined threshold. The main idea is that the words that correspond to the top-N ranked nodes can be considered as semantically more important than others in a specific document. Recent algorithms to calculate the centrality of a graph include PageRank, ArticleRank, Betweenness Centrality, Closeness Centrality, Degree Centrality, and Harmonic Centrality. While also utilizing the above algorithms to calculate centrality measures, our approach differs from the existing ones in that it considers the whole corpus of documents instead of each document separately; hence, we are able to detect a holistic perspective of the importance of each term.
3.2 Document Similarity Subgraph Typically, graph of words derived from similar documents share common word nodes as well as similar structural characteristics. This enables us to calculate the similarity between two documents either by using typical data mining similarity measures (e.g., the Jaccard or the cosine similarity), or by employing frequent subgraph mining techniques (see Sect. 2.3). In our approach, we produce a similarity subgraph, which consists of document nodes and edges of “is_similar” type (we aim to extend the set of supported edge types in the future). It is clear that the creation of such a subgraph is not feasible in approaches that represent each document as a single graph.
3.3 Text Categorization By exploiting the aforementioned document similarity subgraph, we detect communities of contextually similar documents using the “score” property of the “is_similar” type edges as a distance value. A plethora of community detection algorithms can be found in the literature, including Louvain [11], Label Propagation [19], and Weakly Connected Components [14]; an in-depth review of them can be found in [6, 30].
On a Novel Representation of Multiple Textual Documents …
111
Since the documents that belong to the same graph community are similar, as far as their context is concerned, we assume that it is more likely to also share the same class when it comes to performing a text categorization task. Therefore, we can easily decide the class of a document either by using the most frequent class in its community of documents or by running a nearest neighbor algorithm (such as the k-nearest neighbors) using as input the documents of its community.
4 Experiments 4.1 Dataset We have tested the proposed model by utilizing an already preprocessed version of the well-known 20 Newsgroups dataset, and specifically the version containing 18,828 documents organized in various subjects (this dataset can be retrieved from http://qwone.com/~jason/20Newsgroups/20news-18828.tar.gz). This version does not contain unnecessary headers or duplicate texts that would require additional work as far as data cleansing is concerned. It is noted that this dataset is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. It has become a popular dataset for experiments in text applications of ML techniques, such as text classification and text clustering. We claim that this dataset fits well to the purposes of our experimentations (i.e., multi-class classification), given the large volume of different documents on the same subjects.
4.2 Implementation The Neo4j graph database has been utilized for the representation of the proposed graph-of-docs model. Furthermore, we used the Python programming language and the scikit-learn ML library for the implementation of our experiments. The full code and documentation of our approach is freely available at https://github.com/NC0 DER/GraphOfDocs. Our approach consists of four major steps that are described in the sequel. Firstly, we execute a preprocessing function that (i) removes stopwords and punctuation marks from the texts and (ii) produces a list of terms for each document. Secondly, we execute a function that creates a graph of words by using the aforementioned terms. More specifically, this function creates unique word nodes from the list of terms and then links them (if needed), while also calculating the co-occurrence score. In this step, the graph of docs is being created through the progressive synthesis of the graphs of words produced for each document. It is noted that by loading the 20 Newsgroups dataset, we generated a graph of docs with 174,347 unique nodes and 4,934,175 unique edges. Thirdly, we execute the PageRank algorithm aiming to
112
N. Giarelis et al.
identify the most important word nodes in the entire graph. Finally, we implement a function that calculates the Jaccard similarity measure for all document nodes; this function builds the document similarity subgraph and forms communities of similar documents using the Louvain algorithm. Our implementation is sketched in the following pseudocode:
4.3 Evaluation Aiming to evaluate the performance of our approach, we benchmark the accuracy score of the text classifier described in Sect. 3.3 against those of common domainagnostic classifiers that use the bag-of-words model for their text representation (see Table 1). Considering the accuracy of each text classifier, we conclude that the proposed graph-of-docs representation significantly increases the accuracy of text classifiers (accuracy: 97.5%). Table 1 Accuracy scores for the existing and the proposed text classifiers
Text classifier
Accuracy (%)
5-NN
54.8
2-NN
61.0
1-NN
76.0
Naive Bayes
93.7
Logistic regression
93.9
Neural network (100 × 50)
95.5
Neural network (1000 × 500)
95.9
Graph-of-docs classifier
97.5
On a Novel Representation of Multiple Textual Documents …
113
5 Conclusions In this paper, we introduced a novel approach for representing multiple textual documents in a single graph, namely, “graph of docs.” To test our approach, we benchmarked the proposed “graph-of-docs”-based classifier against classical text classifiers that use the “bag-of-words” model for text representation. The evaluation outcome was very promising; an accuracy score of 97.5% was achieved, while the second best result was 95.9%. However, our approach has a set of limitations, in that (i) it does not perform equally well with outlier documents (i.e., documents that are not similar to any other document) and (ii) it has performance issues since the generation of a graph of documents requires significant time in a disk-based graph database such as Neo4j [28]. Aiming to address the above limitations as well as to integrate our approach into existing works on knowledge management systems, future work directions include: (i) the experimentation with alternative centrality measures, as well as diverse community detection and graph partitioning algorithms [2]; (ii) the utilization and assessment of an in-memory graph database in combination with Neo4j; (iii) the enrichment of the existing textual corpus through the exploitation of external domainagnostic knowledge graphs (e.g., DBPedia and Wikipedia knowledge); and (iv) the integration of our approach into collaborative environments where the underlying knowledge is structured through semantically rich discourse graphs (e.g., integration with the approaches described in [7–9]). Acknowledgements The work presented in this paper is supported by the OpenBio-C project (www.openbio.eu), which is co-financed by the European Union and Greek national funds through the Operational Program Competitiveness, Entrepreneurship and Innovation, under the call RESEARCH—CREATE—INNOVATE (Project id: T1EDK-05275).
References 1. Aggarwal, C.C.: Machine Learning for Text. Springer (2018) 2. Armenatzoglou, N., Pham, H., Ntranos, V., Papadias, D., Shahabi, C.: Real-time multi-criteria social graph partitioning: a game theoretic approach. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1617–1628, ACM Press (2015) 3. Blanco, R., Lioma, C.: Graph-based term weighting for information retrieval. Inf. Retr. 15(1), 54–92 (2012) 4. Boudin, F.: A comparison of centrality measures for graph-based keyphrase extraction. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp. 834–838 (2013) 5. Bougouin, A., Boudin, F., Daille, B.: Topicrank: graph-based topic ranking for keyphrase extraction. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing, pp. 543–551 (2013) 6. Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174 (2010)
114
N. Giarelis et al.
7. Kanterakis, A., Iatraki, G., Pityanou, K., Koumakis, L., Kanakaris, N., Karacapilidis, N., Potamias, G.: Towards reproducible bioinformatics: the OpenBio-C scientific workflow environment. In: Proceedings of the 19th IEEE International Conference on Bioinformatics and Bioengineering (BIBE), pp. 221–226, Athens, Greece (2019) 8. Karacapilidis, N., Papadias, D., Gordon, T., Voss, H.: Collaborative environmental planning with GeoMed. Eur. J. Oper. Res. Spec. Issue Environ. Plan. 102(2), 335–346 (1997) 9. Karacapilidis, N., Tzagarakis, M., Karousos, N., Gkotsis, G., Kallistros, V., Christodoulou, S., Mettouris, C., Nousia, D.: Tackling cognitively-complex collaboration with CoPe_it! Int. J. Web-Based Learn Teach. Technol 4(3), 22–38 (2009) 10. Landherr, A., Friedl, B., Heidemann, J.: A critical review of centrality measures in social networks. Bus Inf. Syst. Eng. 2(6), 371–385 (2010) 11. Lu, H., Halappanavar, M., Kalyanaraman, A.: Parallel heuristics for scalable community detection. Parallel Comput. 47, 19–37 (2015) 12. Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004) 13. Miller, J.J.: Graph database applications and concepts with Neo4j. In: Proceedings of the Southern Association for Information Systems Conference, vol. 2324, no. 36, Atlanta, GA, USA (2013) 14. Monge, A., Elkan, C.: An efficient domain-independent algorithm for detecting approximately duplicate database records (1997) 15. Nijssen, S., Kok, J. N.: A quickstart in frequent structure mining can make a difference. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 647–652, ACM Press (2004) 16. Nikolentzos, G., Meladianos, P., Rousseau, F., Stavrakas, Y., Vazirgiannis, M.: Shortest-path graph kernels for document similarity. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1890–1900 (2017) 17. Nikolentzos, G., Siglidis, G., Vazirgiannis, M.: Graph Kernels: a survey. arXiv preprint arXiv: 1904.12218 (2019) 18. Ohsawa, Y., Benson, N. E., Yachida, M.: KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor. In: Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries, pp. 12–18, IEEE Press (1998) 19. Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys. Rev. E 76(3) (2007) 20. Rawat, D.S., Kashyap, N.K.: Graph database: a complete GDBMS survey. Int. J. 3, 217–226 (2017) 21. Rousseau, F., Kiagias, E., Vazirgiannis, M.: Text categorization as a graph classification problem. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1, pp. 1702–1712 (2015) 22. Rousseau, F., Vazirgiannis, M.: Graph-of-word and TW-IDF: new approach to ad hoc IR. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 59–68, ACM (2013) 23. Saigo, H., Nowozin, S., Kadowaki, T., Kudo, T., Tsuda, K.: gBoost: a mathematical programming approach to graph classification and regression. Mach. Learn. 75(1), 69–89 (2009) 24. Seidman, S.B.: Network structure and minimum degree. Soc. Netw. 5(3), 269–287 (1983) 25. Siglidis, G., Nikolentzos, G., Limnios, S., Giatsidis, C., Skianis, K., Vazirgianis, M.: Grakel: a graph kernel library in python. arXiv preprint arXiv:1806.02193 (2018) 26. Sonawane, S.S., Kulkarni, P.A.: Graph based representation and analysis of text document: a survey of techniques. Int. J. Comput. Appl. 96(19) (2014) 27. Tixier, A., Malliaros, F., Vazirgiannis, M.: A graph degeneracy-based approach to keyword extraction. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1860–1870 (2016)
On a Novel Representation of Multiple Textual Documents …
115
28. Wang, W., Wang, C., Zhu, Y., Shi, B., Pei, J., Yan, X., Han, J.: Graphminer: a structural patternmining system for large disk-based graph databases and its applications. In: Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 879–881. ACM Press (2005) 29. Yan, X., Han, J.: gspan: Graph-based substructure pattern mining. In: Proceedings of the IEEE International Conference on Data Mining, pp. 721–724. IEEE Press (2002) 30. Yang, Z., Algesheimer, R., Tessone, C.J.: A comparative analysis of community detection algorithms on artificial networks. Sci. Rep. 6, 30750. https://doi.org/10.1038/srep30750 (2016)
Multi-agent Approach to the DVRP with GLS Improvement Procedure Dariusz Barbucha
Abstract Vehicle Routing Problem (VRP) class refers to a wide range of transportation problems where a set of vehicles have to deliver (or pickup) goods or persons to (from) locations situated in a given area. The Dynamic Vehicle Routing Problem (DVRP) class generalizes the VRP by assuming that information about customers is not given a priori to the decision-maker and it may change during over the time. It means that at any moment of time, there may exist customers already under servicing and new customers which need to be serviced. As a consequence, each newly arriving request needs to be incorporated into the existing vehicles tours, which means that the current solution may need to be reconfigured to minimize the goal functions. The paper presents a multi-agent approach to the DVRP, where Guided Local Search (GLS) procedure has been applied to periodic re-optimization of static subproblems, including requests, which have already arrived to the system. Computational experiment has been carried out to confirm the efficiency of the proposed approach.
1 Introduction Vehicle Routing Problem (VRP) is one of the most known combinatorial optimization problems which generally consists in designing a set of routes for a fleet of vehicles that have to service a set of geographically distributed customers at minimal cost or other important desired factors. In its static version, it can be formulated as an undirected graph G = (V, E). V = {0, 1, . . . , N } is a set of nodes, where the node denoted by 0 is a central depot with K identical vehicles of capacity Q, and each other node i ∈ V \{0} refers to a customer described by the demand di and the service time si . E = {(i, j)|i, j ∈ V } is a set of edges, where each edge (i, j) ∈ E denotes the path between customers i and j and is characterized by the cost ci j of travel from
D. Barbucha (B) Department of Information Systems, Gdynia Maritime University, Morska 83, 81-225 Gdynia, Poland e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_10
117
118
D. Barbucha
i to j by the shortest path, which often reflects the distance traveled by a vehicle from i to j (i, j ∈ V ). The goal of the problem is to find a feasible set of routes with the minimal total cost (traveled distance) starting and ending at the depot, where each customer i ∈ V \{0} is serviced exactly once by a single vehicle, and the total load of any vehicle associated with a given route does not exceed the vehicle capacity Q. The Dynamic Vehicle Routing Problem (DVRP) is derived from its static counterparts by assuming that some features of it may depend on time [15, 16]. For example, customers may modify their orders by changing demands or even canceling them, travel time between two customers may change because of unexpected accidents, like traffic jams or bad weather conditions, etc. Analysis of the available literature referring to the DVRP provides the conclusion that the most common source of dynamism in VRP is an online arrival of customer’s requests while the process of servicing already arrived (available) requests is running [15]. The paper focuses on a Multi-agent System (MAS) to the DVRP with the existence of this source of dynamism. It extends author’s work [3, 5–7] by adding a new procedure of improvement of partial solution including already received requests. The proposed procedure is based on Guided Local Search (GLS), originally proposed by Voudouris and Tsang [18] as an optimization technique suitable for a wide range of computationally hard optimization problems. As other metaheuristics, GLS may be considered as a general algorithmic framework intended to extend capabilities of heuristics by combining one or more heuristic methods using a higher level strategy with the aim to improve their efficiency or robustness. GLS uses search-related information to effectively guide local search heuristics in the vast search spaces. This is achieved by augmenting the objective function of the problem to be minimized with a set of penalty terms which are dynamically manipulated during the search process to control the heuristic to be guided [19]. GLS has been applied to a number of difficult optimization problems such as, for example, Quadratic Assignment Problem (QAP) [10, 19], Satisfability (SAT) and MAXSAT problems [14], 3-D Bin Packing Problem [9], Traveling Salesman Problem (TSP) [19], Vehicle Routing Problems [2, 13, 17] and found to be an effective tool to solve them. A review of the state-of-the-art-guided local search methods and their application the reader can find, for example, in [1, 20]. It is also worth noting that GLS has been also implemented by author within a multi-agent system dedicated to solve the static VRP [4]. Partial results from this implementation have been also applied to the proposed approach to the DVRP. The rest of the paper includes presentation of the proposed multi-agent approach to the DVRP (Sect. 2), details of the computational experiment, analysis of the results (Sect. 3), and conclusions and directions of the future work (Sect. 4).
Multi-agent Approach to the DVRP with GLS Improvement Procedure
119
2 Multi-agent Approach to the DVRP 2.1 Overview Multi-agent system dedicated to simulate and solve the DVRP includes several agents with different responsibilities: – GlobalManager (GM)—its role is to initialize all agents, – RequestGenerator (RG) focuses on generating (or reading from a file) new customers’ requests, – Vehicle (Vh) agents are responsible for servicing the customers’ requests, – RequestManager (RM) manages the list of customers’ requests and allocates them to the available Vh agents, and – Optimizing (OPT) agent solves the static subproblem of the VRP including already arrived requests. General view on simulation and solving process implemented within the proposed system is presented in Algorithm 1 (MAS-DVRP-GLS). It emphasizes the messages exchanged between the above agents, which are the most important from the simulating and solving the problem point of view. Let us assume that the process of simulating and solving the DVRP starts at time 0 and ends at time T (T —length of the working day). Let ti ∈ [0, T ] (i = 1, . . . , N ) denotes the time when the i-th customer’s request is submitted (ti > 0 for each customer requests i ∈ V \{0}). We also assume that the working day is divided into n ts time slices (idea of the time slices has been introduced and/or adapted, for example, in [6, 11, 12]). Each of them has equal length Tts = T /n ts . Let Tt be a start of time slice t (t = 0...n ts − 1). It means that T0 = 0, Tn ts = T . Within each time slice [Tt , Tt+1 ], where t = 0, ..., n ts − 1, the following subprocesses are performed until T has been reached: 1. Receiving and collecting newly arriving requests, and next solving the static subproblem of the DVRP consisting of the collected requests (lines 10–15). 2. Dispatching the cumulated requests to the available vehicles basing on the solution of static subproblem of the DVRP, whenever the end of the time slice has been reached (lines 16–23).
2.2 Collecting the Newly Arriving Requests and Solving the Static Subproblem of the DVRP Because of the fact that considered problem is fully dynamic, the new requests arrive to the system during the whole working day while the process of simulating and solving the DVRP is running. Whenever a new request has been registered in the system, the RG agent sends to the RM agent the ::newRequest message.
120
D. Barbucha
Algorithm 1 MAS-DVRP-GLS 1: Initialize all agents: GM, RG, RM, OPT, and Vh 2: Let PDV R P (t) be a static subproblem of the DVRP including requests belonging to the C Requests set at time slice t 3: Let s be a solution of PDV R P (t) at time slice t 4: T0 ← 0 5: Tn ts ← T 6: C Requests ← ∅ 7: t ← 0 8: while (t < n ts ) do 9: event = get Event () 10: if (event = ::newRequest) then 11: {Collecting the newly arriving requests and solving the static subproblem of the DVRP} 12: C Requests ← C Requests ∪ {o} 13: s ← G L S_DV R P(s) 14: Update the routing plan Rs (t), where s is a new solution of PDV R P (t). 15: end if 16: if (event = ::endOfTimeSlice) then 17: {Dispatching the cumulated requests to the available vehicles basing on the global routing plan} v(1) v(2) v(K ) 18: Let Rs (t) = [Rs (t), Rs (t), . . . , Rs (t)] be a global plan formed by s at time slice v(k) t, where each Rs (t) is a route associated with vehicle v(k) (k = 1, . . . , K ) in solution s at time slice t 19: for k = 1, . . . , K do 20: Send Rsv(k) (t) to vehicle v(k) 21: end for 22: C Requests ← ∅ 23: end if 24: t + + 25: end while
Let C Requests be a set of cumulated requests which have arrived at the time slice t but have not been sent yet to available Vh agents. Initially, the C Requests set is empty, but after receiving the ::newRequest message by the RM agent, it updates C Requests by adding the newly received request to it. As a consequence, at any moment of time, requests cumulated in the C Requests form a static subproblem of the DVRP—PDV R P (t), which has to be solved. The process of (re-)solving the subproblem PDV R P (t) is managed by the RM. It first inserts the newly arrived requests to one of the route using the cheapest insertion heuristic, and next uses dedicated GLS procedure provided by OPT agent to improve the current best solution of PDV R P (t) at the time slice t. The GLS procedure dedicated to solve the static subproblem of the DVRP—PDV R P (t) at the time slice t is presented in Algorithm 2 (GLS-DVRP). Let f i be a solution feature (i = 1, . . . , M, and M is the number of features defined over solutions), which characterize each candidate solution s ∈ S (S—search space). In general, it can be defined as any solution property that satisfies a non-trivial simple constraint.
Multi-agent Approach to the DVRP with GLS Improvement Procedure
121
Let ci (i = 1, . . . , M) be a cost of feature defined as an information pertaining to the problem which represents the direct or indirect impact of the corresponding solution properties on the solution cost. Let pi (i = 1, . . . , M) be a penalty of feature used to augment the original cost function of the problem g. Resulting function h (called augmented cost function) is passed, instead of the original one, for minimization by the local search procedure. It is defined as follows: h(s) = g(s) + λ
M
pi ∗ Ii (s),
(1)
i=1
where Ii is an indicator function defined for each feature f i (Ii (s) = 1 if solution s ∈ S has property i, and 0 otherwise) and λ is a parameter for controlling the strength of constraints with respect to the actual solution cost; it represents the relative importance of penalties with respect to the solution cost and it provides means to control the influence of the above information on the search process. At the beginning of the GLS-DVRP algorithm, all values of features penalty vector are set to 0. Next, the procedure of solution improvement is performed in the loop until stopping criterion is met. In each iteration, L S() function operating on augmented function h is called. After each call of the L S() function, the penalties of features that maximize the utility formula are incremented by 1. If the newly obtained solution is better than the current one (in terms of original cost function g), it is assigned as the current best solution. The process of solution improvement is repeated until stopping criterion is met. Finally, the L S() function is called last time with respect to the original cost function g. In case of the proposed GLS approach to the static subproblem of the DVRP: • The arcs (i, j) ∈ E were chosen as a feature to penalize (similar to [13, 19]) and, hence, the cost of feature is equal to ci j (i, j = 0, . . . , N ). • Penalty factor λ has been arbitrarily set to 0.2. • Three different local search (L S) methods have been implemented and tested within the proposed approach: – L S(1)—an implementation of the 3-opt procedure for TSP operating on a single route, where three randomly selected edges are removed and next remaining segments are reconnected in all possible ways until a new feasible solution (route) is obtained. – L S(2)—an implementation of the dedicated local search method based on the interchange or move of at most λ randomly selected customers between two randomly selected routes (λ-interchange local optimization method, where λ = 2). – L S(3)—another implementation of the dedicated local search method also operating on two routes, and based on exchanging or moving selected customers between these routes. Opposite to the previous method, here, selection of
122
D. Barbucha
customers to exchange or movement is taken not randomly but in accordance to their distance to the centroid of their original route.
Algorithm 2 GLS_DVRP(s) 1: Let M be a number of features, p¯ be a vector of feature penalties: p¯ = [ p1 , . . . , p M ], I¯ - vector of feature indicator functions: I¯ = [I1 , . . . , I M ], c¯ - vector of feature costs: c¯ = [c1 , . . . , c M ], g - an original objective function, and λ - a feature penalty factor used in augmented cost function. 2: k ← 0 3: s ∗ ← sk 4: for i = 1 to M do 5: pi ← 0 6: end for 7: while (stopping criterion is not met) do M 8: h ← g + λ i=1 pi ∗ Ii (sk ) 9: sk+1 ← L S(sk , h) 10: for i = 1 to M do 11: utili ← Ii (sk+1 ) ∗ 1+cipi 12: end for 13: for all i such that utili is maximum do 14: pi ← pi + 1 15: end for 16: if (g(sk+1 ) < g(s ∗ )) then 17: s ∗ ← sk+1 18: end if 19: k ← k + 1 20: end while 21: s ∗ ← L S(s ∗ , g) {best solution found with respect to cost function g} 22: return s ∗
2.3 Dispatching Cumulated Requests to the Available Vehicles The cumulated requests are dispatched to the available vehicles whenever the end of the time slice has been reached (::endOfTimeSlice message received by the RM from the GM). Assuming that s is a solution, let PDV R P (t) and Rs (t) be a global routing plan formed by s at the time slice t: Rs (t) = [Rsv(1) (t), Rsv(2) (t), . . . , Rsv(K ) (t)]. Each Rsv(k) (t) is a route associated with vehicle v(k) (k = 1, . . . , K ) in solution s at the time slice t:
Multi-agent Approach to the DVRP with GLS Improvement Procedure
123
v(k) Rsv(k) (t) = [r1v(k) (t), r2v(k) (t), . . . , r pv(k) (t), . . . , rlength(R v(k) ) (t)],
where r jv(k) (t) are customers already assigned to the route of vehicle v(k) ( j = 1, . . . , length(R v(k) ), k = 1, . . . , K ). Basing on the global routing plan Rs (t) available at the end of current time slice t, the RM agent gradually incorporates the requests to the available Vh agents. It means that each Vh agent v(k), k = 1, . . . , K receives requests defined by Rsv(k) (t) available on positions p + 1, . . . , length(R v(k )), . . . of the route of v(k) in solution s of the static subproblem PDV R P (t). Next, all Vh agents start servicing the requests assigned to them. When all requests from the current time slice have been dispatched, the next time slice starts, and the process of collecting newly arrived requests and allocating them to the Vh agents is repeated.
3 Computational Experiment The performance of the proposed multi-agent approach to the DVRP has been evaluated in the computational experiment. It has been carried out on PC Intel Core i5-2540M CPU 2.60 GHz with 8 GB RAM running under MS Windows 7 on Christofides et al. [8] benchmark VRP dataset. The benchmark dataset includes original 14 instances containing 50–199 customers transformed into their dynamic versions and divided into two subsets: – Set C—instances with only capacity constraints (vr pnc01-vr pnc05, vr pnc11, vr pnc12), – Set CD—instances with capacity and maximum length route constraints (vr pnc06vr pnc10, vr pnc13, vr pnc14). As a quality measure of the obtained results, it has been decided to choose the percentage increase of the cost of allocating all dynamic requests as compared to the best known solution of the static instance. Following the general assumption that the requests may arrive during the whole working day, it has been additionally assumed that requests may arrive with various frequencies. For the purpose of the experiment, arrivals of the requests have been generated using the Poisson distribution, where λ parameter (mean number of requests occurring in the unit of time −1 h in the experiment) was set to 5, 10, and 20. It has been also assumed that the vehicle speed was 60 km/h. Each instance was repeatedly solved five times and mean results from these runs were recorded. The experiment results are presented in Tables 1 and 2, separately for datasets C and CD, respectively. Each table includes the name of the instance, the best known solution for the static instance, and the results of the experiment averaged over all runs (total cost and the percentage increase of cost of allocating all requests of
124
D. Barbucha
Table 1 Results obtained by the proposed MAS-DVRP-GLS (set C) Instance Customers Best known MAS λ = 5 MAS MAS (static) λ = 10 λ = 20
MAS (static)
vr pnc01
50
524,61
vr pnc02
75
835,26
vr pnc03
100
826,14
vr pnc04
150
1028,42
vr pnc05
199
1291,29
vr pnc11
120
1042,11
vr pnc12
100
819,56
630,63 20,21% 1031,05 23,44% 961,14 16,34% 1278,59 24,33% 1476,73 14,36% 1176,30 12,88% 1012,21 23,51%
562,15 7,16% 974,50 16,67% 920,78 11,46% 1200,06 16,69% 1490,09 15,40% 1217,53 16,83% 943,29 15,10%
539,60 2,86% 852,76 2,09% 909,91 10,14% 1099,33 6,90% 1344,56 4,13% 1151,64 10,51% 891,43 8,77%
Table 2 Results obtained by the proposed MAS-DVRP-GLS (set CD) Instance Customers Best known MAS λ = 5 MAS MAS (static) λ = 10 λ = 20 vr pnc06
50
555,43
vr pnc07
75
909,68
vr pnc08
100
865,94
vr pnc09
150
1162,55
vr pnc10
199
1395,85
vr pnc13
120
1541,14
vr pnc14
100
866,37
629,58 13,35% 1054,24 15,89% 1034,32 19,44% 1333,79 14,73% 1665,43 19,31% 1977,31 28,30% 1058,24 22,15%
617,50 11,17% 998,37 9,75% 971,84 12,23% 1285,25 10,55% 1627,59 16,60% 1690,90 9,72% 953,35 10,04%
598,66 7,78% 974,06 7,08% 915,25 5,69% 1193,19 2,64% 1473,93 5,59% 1665,50 8,07% 913,90 5,49%
537,22 2,40% 846,92 1,40% 849,70 2,85% 1062,92 3,35% 1338,38 3,65% 1062,66 1,97% 850,88 3,82%
MAS (static) 569,32 2,50% 911,58 0,21% 891,92 3,00% 1188,03 2,19% 1407,67 0,85% 1580,63 2,56% 888,02 2,50%
dynamic instance as compared to the best known solution of static instance) for dynamic instances (three cases with λ = 5, 10, 20). It has been also decided to test the proposed approach on static instances, where all requests have been known in advance. The results obtained for these cases have been included in the last column of each table.
Multi-agent Approach to the DVRP with GLS Improvement Procedure
125
Results presented in both tables provide us a few observations and dependencies. The first observation is that the results obtained for dynamic instances of the VRP are almost always worse than the results obtained for their static counterparts. Moreover, the observation holds for all levels of λ parameter and all instances but with different strengths. In case of relatively rare arrivals of the new requests in the unit of time (λ = 5), the obtained results are generally worse than in case of moderate (λ = 10) or high frequency of request arrivals (λ = 20). Definitely, arriving a lot of requests in the unit of time (λ = 20) increases the probability of obtaining better results when compared to the instances with lower frequency of new request arrivals (λ = 5 or even 10), where possibility of re-optimization of the routes may be limited. The second observation relates to the comparison of the results obtained, for instance, belonging to the datasets C and CD. Generally, significant differences have not been observed between results obtained for both datasets. As it was mentioned, the experiment aimed also at testing the proposed approach on static instances where all requests have been known in advance. When comparing the results obtained by the proposed approach with the best known results presented in both tables, one can conclude that our approach can be seen as competitive to other approaches to solve the VRP. Mean relative error from the best known solution does not exceed 3–4% for all tested instances. As previously, no significant differences have been observed in results obtained for both datasets C and CD. And finally, taking into account the fact that similar MAS framework for solving the DVRP but with Variable Neighborhood Search (VNS) improvement procedure implemented within it has been proposed in the last author’s paper [7], it has been also decided to compare the results obtained by the proposed MAS approaches to the DVRP with VNS and GLS. Comparison of the results obtained by both approaches has not provided clear outperformance of one of them over another; however, slightly better results have been observed for the approach with GLS improvement procedure included within it.
4 Conclusions This paper proposes a multi-agent approach to the dynamic vehicle routing problem. It assumes that the new customer requests arrive continuously over the time, while the system is running. It means that after arriving a new request, it has to be incorporated into the existing vehicles routes, which often requires using a dedicated procedure to periodic re-optimization of the subproblem including requests which have already arrived. Dedicated guided local search procedure has been proposed and implemented within the proposed system in this role. The computational experiment confirmed effectiveness of the proposed approach. One of the most promising directions of future research seems to be an extension of the proposed multi-agent approach by adding other efficient methods of periodic re-optimization and adapt them to solve DVRP and other variants of it.
126
D. Barbucha
References 1. Alsheddy, A., Voudouris, C., Tsang, E.P.K., Alhindi, A.: Guided local search. In: Marti, R., Pardalos, P.M., Resende, M.G.C. (eds.) Handbook of Heuristics, pp. 261–298. Springer, Cham (2018) 2. Backer, B.D., Furnon, V., Prosser, P., Kilby, P., Shaw, P.: Solving vehicle routing problems using constraint programming and metaheuristics. J. Heuristics 6(4), 501–523 (2000) 3. Barbucha, D., J¸edrzejowicz, P.: Agent-Based approach to the dynamic vehicle routing problem. In: Demazeau, Y., Pavon, J., Corchado, J.M., Bajo, J. (eds.) 7th International Conference on Practical Applications of Agents and Multi-agent Systems (PAAMS 2009). Advances in Intelligent and Soft Computing, vol. 55, pp. 169–178. Springer, Berlin, Heidelberg (2009) 4. Barbucha, D.: Agent-based guided local search. Expert Syst. Appl. 39(15), 12032–12045 (2012) 5. Barbucha, D.: A Multi-agent approach to the dynamic vehicle routing problem with time windows. In: Badica, C., Nguyen, N.T., Brezovan, M. (eds.) Computational Collective Intelligence. Technologies and Applications—5th International Conference, ICCCI 2013. LNCS, vol. 8083, pp. 467–476. Springer, Berlin, Heidelberg (2013) 6. Barbucha, D.: Solving DVRPTW by a Multi-agent system with vertical and horizontal cooperation. In: Nguyen, N.T., Pimenidis, E., Khan, Z., Trawinski, B. (eds) Computational Collective Intelligence. ICCCI 2018. LNCS, vol. 11056, pp. 181–190. Springer, Cham (2018) 7. Barbucha, D.: VNS-based Multi-agent approach to the dynamic vehicle routing problem. In: Nguyen, N.T., Chbeir, R., Exposito, E., Aniorte, P., Trawinski, B.: Computational Collective Intelligence—11th International Conference, ICCCI 2019, Proceedings, Part I. LNCS, vol. 11683, pp. 556-565. Springer, Cham (2019) 8. Christofides, N., Mingozzi, A., Toth, P., Sandi, C. (eds.): Combinatorial Optimization. Wiley, Chichester (1979) 9. Egeblad, J., Nielsen, B., Odgaard, A.: Fast neighbourhood search for two- and threedimensional nesting problems. Eur. J. Oper. Res. 183(3), 1249–1266 (2007) 10. Hani, Y., Amodeo, L., Yalaoui, F., Chen, H.: Ant colony optimization for solving an industrial layout problem. Eur. J. Oper. Res. 183(2), 633–642 (2007) 11. Khouadjia, M.R.: Solving Dynamic Vehicle Routing Problems: From Single-Solution Based Metaheuristics to Parallel Population Based Metaheuristics. Ph.D. Thesis, Lille University, France (2011) 12. Kilby, P., Prosser, P., Shaw, P.: Dynamic VRPs: a study of scenarios. Technical Report APES06-1998, University of Strathclyde, Glasgow, Scotland (1998) 13. Kilby, P., Prosser, P., Shaw, P.: Guided local search for the vehicle routing problem. In: Voss, S., Martello, S., Osman, I.H., Roucairol, C. (eds.) Meta-Heuristics: Advances and Trends in Local Search Paradigms for Optimization, pp. 473–486. Kluwer Academic Publishers (1999) 14. Mills, P., Tsang, E.P.K.: Guided local search for solving SAT and weighted MAXSAT problems. J. Autom. Reason. 24, 205–223 (2000) 15. Pillac, V., Gendreau, M., Gueret, C., Medaglia, A.L.: A review of dynamic vehicle routing problems. Eur. J. Oper. Res. 225, 1–11 (2013) 16. Psaraftis, H.N., Wen, M., Kontovas, C.A.: Dynamic vehicle routing problems: three decades and counting. Networks 67(1), 3–31 (2016) 17. Tarantilis, C.D., Zachariadis, E.E., Kiranoudis, C.T.: A hybrid guided local search for the vehicle-routing problem with intermediate replenishment facilities. INFORMS J Comput. 20(1), 154–168 (2008) 18. Voudouris, C., Tsang, E.: Partial constraint satisfaction problems and guided local search. In: Proceedigns of 2nd International Conference on Practical Application of Constraint Technology (PACT’96), London, pp. 337–356 (1996) 19. Voudouris, C., Tsang, E.: Guided local search and its application to the traveling salesman problem. Eur. J. Oper. Res. 113, 469–499 (1999) 20. Voudouris, C., Tsang, E.P.K., Alsheddy, A.: Guided local search. In: Gendreau, M., Potvin, J.-Y. (eds.) Handbook of Metaheuristics, pp. 321–361. Springer, Berling, Heidelberg (2010)
Intelligent Data Processing and Its Applications
Detecting Relevant Regions for Watermark Embedding in Video Sequences Based on Deep Learning Margarita N. Favorskaya
and Vladimir V. Buryachenko
Abstract In this paper, we propose the original idea for searching the best regions for watermark embedding in the uncompressed and compressed video sequences using a deep neural network. If video sequence is uncompressed, then a huge amount of information can be successfully embedded in the textural regions in each frame or 3D textural volume. The codecs, from MPEG-2 to H.265/HEVC, impose the strict restrictions on a watermarking process due to the standards to transmit any motion in a scene. The basic coding unit is a Group Of Pictures (GOP) including I-frame, P-frame/frames, and B-frame/frames. Among these types of frames, I-frame as a spatial intra-picture prediction from neighboring regions is available for watermarking process. Thus, our goal is to find such frames, which will be I-frames with a high probability, and then detect the textural regions for embedding. The task is complicated by a necessity to detect the scene changes in videos. We use non-end-toend Siamese LiteFlowNet to detect the frames with low optical flow (non-significant background motion), high optical flow (object motion in a scene), or surveillance failure (scene change).
1 Introduction Watermark embedding in video sequences is a weakly studied issue with respect to the watermarking process of still images, grayscale and colored. At the same time, the additional dimension does not only extend a volume of embedded information but entails the changing of type of Internet attacks principally and the problems with a watermark extraction after compression with losses during Internet transmission [1]. M. N. Favorskaya (B) · V. V. Buryachenko Reshetnev Siberian State University of Science and Technology, 31 Krasnoyarsky Rabochy ave, Krasnoyarsk 660037, Russian Federation e-mail: [email protected] V. V. Buryachenko e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_11
129
130
M. N. Favorskaya and V. V. Buryachenko
We have interest to the blind, invisible, and frequency-oriented watermarking process, which possesses the undeniable advantages respect to other watermarking techniques, but, at the same time, it has the high computational cost and non-linear dependence in volume increasing of embedded information. The watermarking paradigms are widely highlighted in literature. At present, the algorithms for frequency-oriented watermark embedding are well developed. They use different frequency transforms like digital Fourier transform, digital cosine transform, wavelets, contourlets, and shearlets or the moments’ representation of an image patch with following singular value decomposition according to, for example, Koch algorithm. Also, some hybrid techniques were presented [2]. The methodology is the same but frequency transforms provide different number of coefficients for embedding that determine better visibility of the watermarked frames. Also, we can mention the attempts to use algorithms inspired by nature [3] and deep learning models for watermark embedding in images in order to optimize the solution. CNN auto-encoder for watermark embedding [4] or WMNet with residual units regarding to possible attacks [5] provides better results with respect to frequency domain but for one watermark embedding, on which deep network was learned. It is evident that uniform embedding in frame is not an adequate strategy due to different frame contents. Therefore, a special attention is paid for choosing the best regions for embedding under criterion of watermark invisibility [6]. Recently, several algorithms were developed for so-called adaptive watermarking process. The cost for invisibility is more complex control of watermarking process and extension of secret key. The main idea of this paper is to attract the achievements of deep learning in order to solve these problems effectively. The remainder of this paper is organized as follows. Section 2 includes a short reviewing of the related work in adaptive watermarking process. Section 3 describes the proposed method for detecting the relevant structures for watermark embedding in the uncompressed and compressed videos. Section 4 provides the experimental results. Finally, Sect. 5 concludes the paper.
2 Related Work It is well known that recognition methods are roughly separated into two main categories, such as the generative and discriminative models. Applying to the discussed task, the first category is directed on the extraction of global features throughout video sequences (for example, Gauss–Markov recognition model, expectation–maximization algorithm, and linear dynamical systems), while the second category studies the local, spatiotemporal features to describe a moving texture in complex scenes. The generative models are more precise; however, it is very difficult to use them for realistic videos. Thus, our attention is paid to the methods from second category. In the case of uncompressed videos, we can search dynamic textures. Dynamic texture is often studied to reach one from two goals, analysis and synthesis, or the both goals together. A watermarking process implies the analysis of 3D structures
Detecting Relevant Regions for Watermark Embedding …
131
suitable for embedding. At first, the dynamic texture was explored as a temporal regularity represented by multidimensional signal. Its analysis supposed the spatiotemporal autoregressive identification techniques, 3D wavelet decomposition, Laplacian of Gaussian filters, or tensor decomposition. The following approach to detect the dynamic textures in video sequences was directed on the optical flow analysis and prediction. With appearance of Local Binary Patterns (LBP), the consecutive frames were represented as 3D textural structures, easily computed. The hybrid approach based on LBP, Weber local descriptor, and optical flow was suggested in [7]. The intermediate decision was proposed in [8], when a handcrafted LBP-flow descriptor with Fisher encoding was initially used to capture low-level texture dynamics, and a neural network was learned to extract a higher level representation scheme for dynamic texture recognition. The development of deep neural networks led to another view on the dynamic texture analysis. At that time, the object segmentation and recognition, as well as, optical flow prediction are implemented by deep convolutional and recurrent neural networks more effective than all previous techniques oriented on feature extraction by a human. In [9], the static PCANet was extended to the spatiotemporal domain (PCANet-TOP) similar to LBP-TOP approach. PCANet-TOP could not be declared as a deep network because it analyzed the spatial (XY), spatiotemporal (XT), and spatiotemporal (YT) images as a set of patches with sizes L 1 = L 2 = 8 pixels extracting eigenvectors and using two layers only. PCANet-TOP operated on grayscale frames of the UCLA and DynTex datasets and prevailed a bit LBT-TOP classifier in precision. In [10], a two-stream model for dynamic texture synthesis was based on ConvNet, which encapsulated statistics of filter responses from the texture recognition (spatial component) and from the optical flow (temporal component). The architecture included file scales of input representation in each stream. The input samples were selected from the DynTex database and collected in the wild. The main goal of this research was to generate entirely novel dynamic textures as an image style transfer task. In the case of compressed videos [1], we pay attention for motion domain in order to extract such frames, which can be I-frames with high probability considering not only the shots’ analysis but also the scene changes. Motion estimation has a great variety of its applications in computer vision. As a result, many methods were developed since 1980s, simple and complicated. The optical flow in the sense of obtaining 2D displacement field that describes a motion between two successive frames is one the most applicable techniques. At present, five branches exist which are mentioned below [11]: – Differential technique referred to the gradient-based approach, which computes the velocity from spatial and temporal derivatives of the image brightness and divides into the global and local differential methods. – Region-based technique based on searching the matching patches in two consecutive frames, which determines the shift of the patches as the rough but robust to noise technique.
132
M. N. Favorskaya and V. V. Buryachenko
– Feature-based technique searches the sparse but discriminative features in successive frames over time. – Frequency-based technique uses the velocity-tuned filters in Fourier domain in order to calculate optical flow by the energy-based or phase-based methods. – CNN-based technique extracts deep features from frames replacing extraction of handcrafted features. End-to-end methods provide matching these features, while not end-to-end methods are employed for feature extraction to update and refine the optical flow estimates. The most known deep architectures for optical flow estimation are FlowNet, Flownet 2.0, Spatial Pyramid Network (SpyNet), LikeNet, PWC-Net, Siamese CNN, EpicFlow, and RecSPy. CNNs for optical flow estimation have pros and cons. The main advantage is a flexibility in multiscale features’ extraction in the complex motion of multiple objects and non-linear transformations in a scene. The trained optical flow CNNs are faster than variational methods. The drawbacks deal with a necessity of ground-truth labelings for real scenes, overfitting, and hyperparameters tuning. In video analysis, motion estimation has a paramount meaning. First, we apply LiteFlowNet [12] and build a set of residual fields for all frames. Second, we find the best regions for watermark embedding using the gradient evaluation and the statistical and model-based approaches as a spatial frame analysis. Here, we need in approximate evaluation of textural regions.
3 The Proposed Method Proposed method concatenates two types of information: optical flow and high textural regions. In the case of uncompressed videos, we may use any frame for watermark embedding, reconstructing a textural map when changing a scene. In the case of compressed videos, we ought to select the frames, which can be I-frames with high probability. Hereinafter, Sect. 3.1 provides optical flow estimates, while Sect. 3.2 discusses the textural estimates. Concatenation procedure is represented in Sect. 3.3.
3.1 Optical Flow Estimates Based on Deep Learning The most representative motion features in videos are provided by optical flow, which for a long time was implemented based on feature motion trackers. FlowNet [13] was the first CNN that predicted a dense optical flow. At present, a family of such deep networks is developed resulting in Flownet 2.0 [14] and LiteFlowNet [12]. LiteFlowNet is a lightweight cascaded CNN with encoder and decoder parts. The encoder extracts the multiscale pyramidal features from consecutive frames, and the
Detecting Relevant Regions for Watermark Embedding …
133
decoder estimates coarse-to-fine flow fields using the cascaded flow inference and regularization module. The final optical flow image is visualized through flow field. Our goal is to find I-frames, in other words, frames with very low optical flow. If we detect several consecutive frames with low optical flow, then we use the first frame from this set as a potential frame for embedding in the case of the compressed video. However, it will be good to avoid two deep networks estimating optical flows for two cases—compressed and uncompressed videos. Thus, we constructed Siamese network based on LiteFlowNet architecture, which is applicable to find I-frames and scene changes. A watermarking process of compressed video is a special case of the common watermarking process of uncompressed video. The output of our Siamese LiteFlowNet is a residual filed between two flow fields. Analyzing the pixels’ displacements in a residual field, we make a decision about low optical flow (no motion), significant optical flow (motion in a scene), or surveillance failure (scene change). The proposed network structure of Siamese LiteFlowNet is depicted in Fig. 1. As a result, we can build a graph of motion type in each frame (see Fig. 2). This graph allows us to detect a scene change and potential I-frames in the shots. If the shifts in optical flow field are large in magnitudes and chaotic, then this means a scene change. Usually a scene change is expressed by the explicit peaks, but in the case of slow scene change these moments are difficult to detect. If normalized
Fig. 1 Architecture of the proposed Siamese LiteFlowNet
134
M. N. Favorskaya and V. V. Buryachenko
Fig. 2 Motion analysis of Drone_Videos.mp4 [15]. The outliers on the graph correspond to the frame numbers with scene changes (frames 346, 477, 601, and 789)
motion shifts, which are calculated based on the residual fields, have relatively stable values, then these frames are analyzed in detail as the candidates for I-frames.
3.2 Textural Estimates The following task is to find the relevant textural regions for embedding in I-frames for videos under compression or in all frames for uncompressed videos. Texture analysis has its long history since 1970s beginning from Tamura local descriptors and Haralick descriptors. Hereinafter, more complex approaches were proposed including gray-level co-occurrence matrix, fractal dimension, structure tensor, LBP, Laws’ texture energy, Hermite transform, Markov random fields, and ordinary and deep networks. However, for the purpose of embedding we have interest for one-two texture properties, i.e., its homogeneity and its color. As it was shown in [16], a combination of gradient evaluation and the statistical and model-based approaches provides a successful selection of relevant regions for watermark embedding. For this purpose, we decided to use LBP modifications. LBP indicates a unique encoding of the central pixel with position c regarding its local neighborhood (number of neighbors P) using a predefined radius value R. LBP is computed using Eq. 1, where g(·) is a grayscale value of pixel, g(·) ∈ [0…255]. L B PP,R
P−1 = s g p − gc 2 p , p=0
where s(·) =
1, s(·) ≥ 0 0, s(·) < 0
(1)
Our goal is to find high gradient regions, in other words, we analyze a uniformity of LBP. A uniformity measure U returns the number of bitwise 0/1 and 1/0. LBP means the uniform LBP if U ≤ 2. Thus, in (8, R) neighborhood one can find 58 possible
Detecting Relevant Regions for Watermark Embedding …
135
uniform patterns. In order to extract the neighbor’s gray values, the rotation-invariant variance measure VAR was introduced in [17]: V A R P,R =
P−1 2 1 gp − µ , P p=0
where µ =
P−1 1 gp . P p=0
(2)
Measure G(LBPP,R ) can be used for gradient magnitude estimation [18]: G L B PP,R =
⎧ Rn ⎪
⎪ ⎨ V A R P,r
if L B PP,R is uniform
⎪ ⎪ ⎩
else
r =R1
0
.
(3)
The weak edge patterns are well preserved using NI-LBPP,R measure as LBP-like descriptor proposed by Liu et al. [19]: NI − LBP P,R
P−1 = s g p − µ R 2n , where s(·) = p=0
1 , s(·) ≥ 0 0 , s(·) < 0
.
(4)
Experiments show that two measures G(LBPP,R ) and NI-LBPP,R find successfully high textural regions in a frame as potential regions for watermark embedding. In the complex case, we can attract fractal descriptors as more accurate estimates.
3.3 Concatenation Concatenation of optical flow map and textural map provides a final result of relevant regions for watermark embedding. Additionally, we can choose the regions with blue tone because they possess lesser sensitivity for a human vision if it is possible. It is interesting that optical flow maps are well correlated with the textural maps hinting the location of textural regions in a frame as depicted in Fig. 3. Frequency of I-frame selection is also a non-predictable parameter. It is defined partially by motion intensity in a scene. Therefore, finding 3D textural structures for
Fig. 3 Concatenation process: a candidate of I-frame, b optical flow, c optical mask, d concatenation of optical mask and textural mask
136
M. N. Favorskaya and V. V. Buryachenko
embedding, we ought to estimate the gaps, from which I-frames will be chosen by codec. Nevertheless, we embed some service information in each frame, for example, the number of frame and fragile watermark remembering about typical attacks on videos. We prepare 3D structures for watermark embedding according to following criteria: – Middle or high degree of textural properties. – Low degree of motion properties. – Elimination of structures’ sizes for embedding algorithm, for example, by a rectangle area with finite number of patches 8 × 8 pixels in a frame. – Selection of blue-colored structures if it is possible.
4 Experimental Results The experiments were done using Middlebury Dataset [20] and MPI Sintel Dataset [21] containing videos with different scenes, as well as, the corresponding optical flow maps. For Siamese LiteFlowNet learning, the available dataset containing several thousands of frames and ground-truth optical maps was divided into the training, validation, and test sets in the ratio of 70%, 15%, and 15%, respectively. The proposed Siamese LiteFlowNet accepts the input frames and outputs a map of dense optical flow in scale 1:4 respect to the origin frame resolution. Middlebury Dataset [20] includes more than 15 good-quality video sequences containing the moving foreground objects. Good resolution allows easily to evaluate a quality of optical flow detection during motion analysis. Also, high-quality video sequences obtained from quadrocopters—Drone_Videos.mp4 [15], containing natural objects and foreground movement, were used to analyze the algorithm efficiency. Several video sequences were combined into one to check the scene change detection. Visual example of such change on the levels of optical flow fields and residual fields is depicted in Fig. 4. As we see, during scene detection the textural parameters and the foreground objects are no longer detected (Fig. 4b). The residual fields (Fig. 4c) show the low residuals or low texture complexity in the stable scenes. However, when a scene change is detected, these values become sharp and unpredictably increase. The following actions are to obtain the concatenation of optical flow maps with the corresponding textural maps and classify a type of motion as the low motion, high motion, or scene change. Some frame estimates are shown in Table 1. The normalized optical complexity is estimated based on residual fields built by Siamese LiteFlowNet. The normalized textural complexity is calculated using LBP estimates. Class of motion is a weighted coefficient defining a motion complexity. Frame quality for watermark embedding is an empirical estimate proportional to the optical and textural complexity and inverse proportional to the class of motion.
Detecting Relevant Regions for Watermark Embedding …
137
Fig. 4 Detection of explicit scene change based on optical flow neural evaluation: a frames 471, 473, 475, 477, 479, and 481 from left to right, b optical flow fields, c residual fields Table 1 Examples of frame estimates Video sequence
Frame complexity (0–1)
Class of motion*
Frame quality for watermark embedding
Optical
Textural
DJI_0501.mp4, frame 10
0.18
0.21
1
0.39
DJI_0501.mp4, frame 15
0.19
0.23
1
0.42
DJI_0574.mp4, frame 10
0.66
0.59
2
0.63
DJI_0574.mp4, frame 15
0.69
0.62
2
0.65
DJI_0596.mp4, frame 10
0.13
0.11
1
0.24
DJI_0596.mp4, frame 15
0.15
0.12
1
0.27
MSI_SIntel_Ambush, frame 10
0.23
0.31
1
0.54
MSI_SIntel_Ambush, frame 12
0.23
0.31
1
0.54
Middlebury-Walking, frame 10
0.15
0.67
2
0.41
Middlebury-Walking, frame 15
0.16
0.65
2
0.40
Drone_Videos.mp4, frame 345
0.43
0.71
3
0.38
Drone_Videos.mp4, frame 347
0.21
0.17
3
0.13
*1—Low motion, 2—high motion, 3—scene change
138
M. N. Favorskaya and V. V. Buryachenko
A scene change is usually detected if significant changes between the adjacent frames are detected (Drone_Videos.mp4, frames 345–347, etc.). In other cases, the regions suitable for watermark embedding are evaluated by the optical and textural regions.
5 Conclusions We follow the assumption that any video sequence is compressed before its transmission through Internet automatically. To predict a type of codec is also impossible. Thus, we are oriented on the most stable frames, i.e., frames without significant motion, as the main candidates on I-frames (without compression). However, we keep any gap including a small set of consecutive frames—candidates on I-frames and duplicate hiding information into each frame. The proposed Siamese LiteFlowNet helps to obtain the residual fields in the sequential frames. The following concatenation of optical flow maps with the corresponding textural maps allows to classify a type of motion as the low motion, high motion, or scene change that indicates about a possibility for detecting relevant regions for watermark embedding in the compressed video sequences. Acknowledgments The reported study was funded by the Russian Fund for Basic Researches according to the research project No. 19-07-00047.
References 1. Favorskaya, M.N., Buryachenko, V.V.: Authentication and copyright protection of videos under transmitting specifications. In: Favorskaya, M.N., Jain, L.C. (eds.) Computer Vision in Advanced Control Systems-5, ISRL, vol. 175, pp. 119–160. Springer, Cham (2020) 2. Zhu, H., Liu, M., Li, Y.: The Rotation Scale Translation (RST) invariant digital image watermarking using Radon transform and complex moments. Digit. Signal Proc. 20(6), 1612–1628 (2010) 3. Abdelhakim, A.M., Saleh, H.I., Nassar, A.M.: A quality guaranteed robust image watermarking optimization with artificial bee colony. Expert Syst. Appl. 72, 317–326 (2017) 4. Kandi, H., Mishra, D., Gorthi, S.R.S.: Exploring the learning capabilities of convolutional neural networks for robust image watermarking. Comput. Secur. 65, 247–268 (2017) 5. Mun, S.-M., Nam, S.-H., Jang, H., Kim, D., Lee, H.-K.: Finding robust domain from attacks: a learning framework for blind watermarking. Neurocomputing 337, 191–202 (2019) 6. Favorskaya, M.N., Jain, L.C., Savchina, E.I.: Perceptually tuned watermarking using nonsubsampled shearlet transform. In: Favorskaya M.N., Jain L.C. (eds.) Computer Vision in Control Systems-4, ISRL, vol. 136, pp. 41–69. Springer, Cham (2018) 7. Chen, J., Zhao, G., Salo, M., Rahtu, E., Pietikäinen, M.: Automatic dynamic texture segmentation using local descriptors and optical flow. IEEE Trans. Image Process. 22(1), 326–339 (2013) 8. Kaltsa, V., Avgerinakis, K., Briassouli, A., Kompatsiaris, I., Strintzis, M.G.: Dynamic texture recognition and localization in machine vision for outdoor environments. Comput. Ind. 98, 1–13 (2018)
Detecting Relevant Regions for Watermark Embedding …
139
9. Arashloo, S.R., Amirani, M.C., Noroozi, A.: Dynamic texture representation using a deep multi-scale convolutional network. J. Vis. Commun. Image R. 43, 89–97 (2017) 10. Tesfaldet, M., Marcus A. Brubaker, M.A., Derpanis, K.G.: Two-stream convolutional networks for dynamic texture synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 6703–6712. Salt Lake City, UT, USA (2018) 11. Tu, Z., Xie, W., Zhang, D., Poppe, R., Veltkamp, R.C., Li, B., Junsong Yuan, J.: A survey of variational and CNN-based optical flow techniques. Sig. Process. Image Commun. 72, 9–24 (2019) 12. Hui, T.-W., Tang, X., Loy, C.-C.: LiteFlowNet: a lightweight convolutional neural network for optical flow estimation. In: the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8981–8989. Salt Lake City, Utah, USA (2018) 13. Dosovitskiy, A., Fischer, P., Ilg, E., Höusser, P., Hazırbas, C., Golkov, V., van der Smagt, P., Cremers, D., Brox, T.: Flownet: learning optical flow with convolutional networks. In: IEEE International Conference on Computer Vision, pp. 2758–2766. Santiago, Chile (2015) 14. Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: evolution of optical flow estimation with deep networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2462–2470. Honolulu, HI, USA (2017) 15. Drone Videos DJI Mavic Pro Footage in Switzerland. https://www.kaggle.com/kmader/dronevideos. Last accessed 3 Jan 2020 16. Favorskaya, M., Pyataeva, A., Popov, A.: Texture analysis in watermarking paradigms. Proc. Comput. Sci. 112, 1460–1469 (2017) 17. Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987 (2002) 18. Teutsch, M., Beyerer, J.: Noise resistant gradient calculation and edge detection using local binary patterns. In: Park, J.I., Kim, J. (eds.) Computer Vision—ACCV 2012 Workshops. LNCS, vol. 7728, pp. 1–14. Springer, Berlin, Heidelberg (2013) 19. Liu, L., Zhao, L., Long, Y., Kuang, G., Fieguth, P.: Extended local binary patterns for texture classification. Image Vis. Comput. 30(2), 86–99 (2012) 20. Middlebury Dataset. http://vision.middlebury.edu/flow/data/. Last accessed 3 Jan 2020 21. MPI Sintel Dataset. http://sintel.is.tue.mpg.de/downloads. Last accessed 3 Jan 2020
Artificial Neural Network in Predicting Cancer Based on Infrared Spectroscopy Yaniv Cohen , Arkadi Zilberman, Ben Zion Dekel, and Evgenii Krouk
Abstract In this work, we present a Real-Time (RT), on-site, machine-learningbased methodology for identifying human cancers. The presented approach is reliable, effective, cost-effective, and non-invasive method, which is based on Fourier Transform Infrared (FTIR) spectroscopy—a vibrational method with the ability to detect changes as a result of molecular vibration bonds using Infrared (IR) radiation in human tissues and cells. Medical IR Optical System (IROS) is a tabletop device for real-time tissue diagnosis that utilizes FTIR spectroscopy and the Attenuated Total Reflectance (ATR) principle to accurately diagnose the tissue. The combined device and method were used for RT diagnosis and characterization of normal and pathological tissues ex vivo/in vitro. The solution methodology is to apply Machine Learning (ML) classifier that can be used to differentiate between cancer, normal, and other pathologies. Excellent results were achieved by applying feedforward backpropagation Artificial Neural Network (ANN) with supervised learning classification on 76 wet samples. ANN method shows a high performance to classify; overall, 98.7% (75/76 biopsies) of the predictions are correctly classified and 1.3% (1/76 biopsies) is wrong classification.
Y. Cohen (B) · E. Krouk National Research University Higher School of Economics, 20 Myasnitskaya ul, Moscow 101000, Russian Federation e-mail: [email protected] E. Krouk e-mail: [email protected] A. Zilberman Ben Gurion University of the Negev, 8410501 Beer-Sheva, Israel e-mail: [email protected] B. Z. Dekel Ruppin Academic Center, 4025000 Emek Hefer, Israel e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_12
141
142
Y. Cohen et al.
1 Introduction Tumor detection at initial stages is a major concern in cancer diagnosis [1–9]. Cancer screening involves costly and lengthy procedures for evaluating and validating cancer biomarkers. Rapid or one-step method preferentially non-invasive, sensitive, specific, and affordable is required to reduce the long diagnostic processes. IR spectroscopy is a technique routinely used by biochemists, material scientists, etc., as a standard analysis method. The observed spectroscopic signals are caused by the absorption of IR radiation that is specific to functional groups of the molecule. These absorption frequencies are associated with the vibrational motions of the nuclei of a functional group and show distinct changes when the chemical environment of the functional group is modified [2, 3]. IR spectroscopy essentially provides a molecular fingerprint and IR spectra contain a wealth of information on the molecule. In particular, they are used for the identification and quantification of molecular species, the interactions between neighboring molecules, their overall shape, etc. IR spectra can be used as a sensitive marker of structural changes of cells and of reorganization occurring in cells [2–9] and most biomolecules give rise to IR absorption bands between 1800 and 700 cm−1 , which are known as the “fingerprint region” or primary absorption region. The medical IROS device [2] relates to methods employing Evanescent Wave FTIR (EW-FTIR) spectroscopy using optical elements and sensors operated in the ATR regime in the MIR region of the spectrum. Therefore, as recently shown, Fourier transform IR (FTIR) spectroscopy coupled with computational methods can provide fingerprint spectra of benign tissues and their counterpart malignant tumors with a high rate of accuracy [3]. Our aim was to use FTIR spectroscopy combined with machine learning methods for the primary evaluation of the characteristic spectra of colon and gastric tissue from patients with healthy and cancer tissue, thus creating a novel platform for the application of FTIR spectroscopy for real-time, on-site early diagnosis of colon cancer. In Fig. 1, the circle of data transfer of patients’ data to the medical IROS [2] for machine learning is presented, whereas Fig. 2 presents full coupled system of data transfer from each patient of different hospitals into the center of collection information and its decision made by medical personnel after analyzing results of machine learning occurring there. The remainder of this paper is organized as follows. Section 2 presents a short summary of medical IROS. Basic definitions of ANN are discussed in Sect. 3. Section 4 provides a description of network training algorithms. Section 5 includes preliminary practical results. Section 6 concludes the paper.
Artificial Neural Network in Predicting Cancer …
143
Fig. 1 Circle of data transfer
Fig. 2 Transfer of data from each patient and each hospital to the center of final decision
2 Short Summary of Medical IROS The aim is to develop a dedicated combined apparatus suitable for biological tissue characterization via FTIR spectroscopic measurement during clinical practice [2]. According to the teachings of the device, it relates to combined device and method for the in vitro analysis of tissue and biological cells which may be carried out in a simple and, preferably, automated manner. The device and method produce result rapidly (up to minutes) and permit the determination/detecting of structural changes between a biological specimen and a reference sample. In accordance with the teachings of the medical IROS, the human’s tissue is applied to unclad optical element (crystal, etc.) working in ATR regime. A beam of mid-IR (infrared) radiation is passed through a low loss optical element and interacts with the tissue via ATR effect. In this process, the absorbing tissue is placed in direct
144
Y. Cohen et al.
contact with the optical element. The novel combined apparatus (FTIR spectrometer with opto-mechanical elements and software) adopts an integrative design in appearance, and it is a bench top device.
3 ANN Concept—Basic Definitions ANN [10] is the mathematical structure, which consists of interconnected artificial neurons that mimic the way a biological neural network (or brain) works. ANN has the ability to “learn” from data, either in a supervised or an unsupervised mode and can be used in classification tasks [11, 12]. In Multi-layer Feedforward (MLFF) networks, the neurons (nodes) are arranged in layers with connectivity between the neurons of different layers. Figure 3 is the schematic representation of a simple artificial neural network model. The artificial neurons have input values, which are the output product of other neurons or, at the initial level, the input variables (input p = 1, 2, …, n). These values are then multiplied by a weight W and the sum of all these products () is fed to an activation function F. The activation function alters the signal accordingly and passes the signal to the next neuron (s) until the output of the model is reached. Each node is connected by a link with numerical weights and these weights are stored in the neural network and updated through the learning process.
Fig. 3 Multi-layer feedforward network, p1, …, pn measured spectral signatures
Artificial Neural Network in Predicting Cancer …
145
4 Network Training Algorithms Levenberg–Marquardt (LM) backpropagation method is a network training function that updates weight and bias values according to LM optimization. It is often the fastest backpropagation algorithm, and is highly recommended as a first-choice supervised algorithm, although it does require more memory than other algorithms. LM algorithm is an iterative technique that locates a local minimum of a multivariate function that is expressed as the sum of squares of several non-linear, realvalued functions. The algorithm changes current weights of the network iteratively such that objective function, F(w), is minimized as shown in Eqs. 1 or 2: F(w) =
M P 2 di j − oi j
(1)
i=1 j=1
F(w) = E E T
(2)
where w = [w1 , w2 , …wN ]T is a vector of all weights, N is the number of weights, P is the number of observations or inputs (signatures), M is the number of output neurons, and d ij and oij are the desired value (“target value”) and the actual value (“predicted value”) of the ith output neuron and the jth observation. LM method is very sensitive to the initial network weights. Also, it does not consider outliers in the data, what may lead to overfitting noise. To avoid those situations, Bayesian regularization technique can be used.
5 Preliminary Practical Results Acknowledgment: Data base presented in this paper with a special permission from P.I.M.S (PIMS LTD, Beer Sheva, Israel). The goal is to analyze the influence of ANN structure on the results of classification. After choosing the better structure, the performance of different ANN training methods was compared. Spectral data used for analysis are presented in Table 1. Thereinafter, pre-processing, number of inputs selection, ANN design, training, testing, validation, and selection of training algorithms and optimal amount of neurons in hidden layer are discussed in Sects. 5.1–5.5, respectively. Table 1 Spectral data used for analysis (~5.7–11 um waveband) Spectral interval, cm−1
Resolution, cm−1
Number of spectral signatures
950–1750
4
200
146
Y. Cohen et al.
Fig. 4 Typical molecular absorption positions (molecular bonds and spectral signatures), where 1—protein Amide I, 2—protein Amide II, 3—lipids and protein (CH3), 4—phospholipids and Amide III, 5—PO2 phospholipids and nucleic acids. The strength of spectral signatures is changed depending on the tissue features/pathologies [2–4]
5.1 Pre-processing The data is extracted and formatted in accordance with ANN demands: (1) The measured FTIR-ATR signal is converted to a spectral absorbance A(λ) defined by Eq. 3: Aλ = − log10
Iλ − Idar k,λ Ir e f,λ − Idar k,λ
(3)
where I λ is the spectral intensity measured with the sample, I ref,λ is the reference signal (without sample) for source correction, I dark,λ is the dark counts, and λ is the wavenumber, cm−1 . (2) Peak normalization (Fig. 4). The absorption spectrum A(λ) is normalized by a maximal value at 1640 cm−1 (Amide I absorption) provided by Eq. 4: Y (λ) = Aλ /A 1640 cm−1 .
(4)
(3) First derivative of the spectral absorbance is depicted in Fig. 5.
5.2 Number of Inputs Selection Measured spectral signatures at given wavelengths, pn (λ), n = 200, are used as the inputs (input layer) to ANN. To reduce amount of inputs, the criterion of “min
Artificial Neural Network in Predicting Cancer …
147
Fig. 5 The graph of first derivative of the spectral absorbance
Fig. 6 The graph of CV
variance” was used. The variance and Coefficient of Variations (CV) were calculated at each wavelength for the data matrix [76 × 200]. Then the threshold was applied to the CV vector (Fig. 6): √ CV =
var( p) ≥ threshold. p¯
(5)
The spectral signatures at appropriate CV values (CV ≥ threshold) were used as the inputs to ANN.
148 Table 2 Dataset for ANN training and validation, dataset = 76 samples
Y. Cohen et al. Class labels (biopsy)
Count
Percent
Norm
72
94.74
Polyp
2
2.63
Cancer
2
2.63
Fig. 7 Example of ANN structure for classification of the dataset with 10 hidden layer neurons and 3 output neurons: 55 spectral signatures (Input); 3 outputs: Cancer, Normal, Polyp
5.3 ANN Design The data partitioning is the following: training set 60%, testing set 20%, and validation set 20%. The experimental data used for ANN models development are given in Table 2. 45 samples were used for training set and the rest are used for testing and validation (31 samples). The selection of data for training and testing was made in such a way that at least one sample of polyp and one sample of cancer will be in the training and testing sets. The selected ANN structure is a three-layer feedforward, fully connected hierarchical network consisting of one input layer, one hidden layer, and one output layer. Different iterative backpropagation algorithms have been implemented to determine errors for the hidden layer neurons and subsequent weight modification. To define the number of neurons in the hidden layer of the network, Mean Square Error (MSE) and R2 were analyzed. In order to avoid undesirably long training time, a termination criterion has been adopted. This criterion may be either completion of a maximum number of epochs (training cycles) or achievement of the error goal (Fig. 7).
5.4 Training, Testing, and Validation The training stopped when the validation error is starting to increase (occurred at 15 training cycle (epoch) which is presented in Fig. 8. To evaluate the performance of the network and indicate the error rate of presented model, statistical error estimation methods are used. The basic error estimation method is MSE provided by Eq. 6: MSE =
n t=1
2 X t − X to /n.
(6)
Artificial Neural Network in Predicting Cancer …
149
Fig. 8 Training, testing, and validation
5.5 Selection of Training Algorithms and Optimal Amount of Neurons in Hidden Layer The best results obtained with LM training algorithm are presented in Table 3. Performance evaluation examines the confusion matrix between target classes (True) and output classes (predicted). The confusion matrix shows the percentages of correct and incorrect classifications. Classification accuracy is the percentage of the number of the correctly classified samples over the total number of samples in each group or class (Table 4). Figure 9 shows an example of using LM algorithm for ANN training and validation. Overall, 98.7% (75/76 biopsies) of the predictions are correctly classified, while 1.3% (1/76 biopsies) is wrong classification. Table 3 Network training backpropagation algorithms Algorithm
Number of inputs
Transfer functions
LM
138
Tansig-pureline
LM
55
Tansig-pureline
Best validation performance
R2
0.01
3.3×10−3
0.95 0.96
Number of Network hidden performance neurons MSE 2 5
0.0088
1.5×10−4
2
0.014
3.7×10−3
0.92
5
0.009
4.87×10−4
0.96
8
0.0088
2.4×10−4
0.96
11
0.009
8×10−4
0.95
9c
9b
9a
Figure
Normal classification %
72 biopsies are correctly classified as “Normal” 0% misclassified
72 biopsies are correctly classified as “Normal” 0% misclassified
72 biopsies are correctly classified as “Normal” 0% misclassified
Number of hidden neurons
2
5
8
Table 4 Classification accuracy for different numbers of hidden neurons
1 case is correctly classified as “Polyp” 50% misclassified
2 cases (Polyp) are incorrectly classified as “Normal” 100% misclassified
2 cases (Polyp) are incorrectly classified as “Normal” 100% misclassified
Polyp classification, %
2 cases are correctly classified as “Cancer” 0% misclassified
2 cases (Cancer) are correctly classified as “Cancer” 0% misclassified
2 cases (Cancer) are incorrectly classified as “Normal” 100% misclassified
Cancer classification, %
150 Y. Cohen et al.
Artificial Neural Network in Predicting Cancer …
151
Fig. 9 Output class (predicted) and Target class (desired): 1—Normal; 2—Polyp; 3—Cancer Green—correctly classified; Red—misclassified; Blue—total percent of correctly and misclassified
6 Conclusions This report aims to evaluate ANN in predicting cancer and other pathologies based on measurements by FTIR-ATR device. The feedforward backpropagation neural network with supervised learning is proposed to classify the disease: cancer/noncancer or cancer-polyp-normal. The reliability of the proposed neural network method is examined on the data collected through Medical IROS (FTIR-ATR) device and obtained by a biopsy.
152
Y. Cohen et al.
Choosing the optimal ANN architecture is followed by selection of training algorithm and related parameters. The selected ANN structure is a three-layer feedforward, fully connected hierarchical network consisting of one input layer, one hidden layer, and one output layer. Six iterative backpropagation algorithms have been implemented to determine errors for the hidden layer neurons and subsequent weight modification. The determination of the number of layers and neurons in the hidden layers is done by the trial-and-error method. In order to determine optimal ANN model, a number of hidden neurons (2–11) in single hidden layer were considered and varied. The transfer functions tansig in hidden and linear in output layer were found to be optimal. After training, each ANN model is tested with the testing data, and optimal ANN architecture was found by minimizing test error with testing data and Mean Square Error (MSE) for training data. The final network structure in the first strategy has 55 inputs, 8 neurons in the hidden layer, and 3 neurons in the output layer. The best performance was obtained with LM training algorithm. Overall, 98.7% (75/76 biopsies) of the predictions are correctly classified and 1.3% (1/76 biopsies) is wrong classification. Using ATR-FTIR with ANN software with large database may have an important role for the development of next-generation real-time techniques for ex vivo identification tests of tumors.
References 1. Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R.L., Torre, L.A., Jemal, A.: Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clinic. 68, 394–424 (2018) 2. Dekel, B., Zilberman, A., Blaunstein, N., Cohen, Y., Sergeev, M.B., Varlamova, L.L., Polishchuk, G.S.: Method of infrared thermography for earlier diagnostics of gastric colorectal and cervical cancer. In: Chen, Y.W., Tanaka, S., Howlett, R., Jain, L. (eds.) Innovation in Medicine and Healthcare—InMed 2016, SIST, vol. 60, pp. 83–92. Springer, Cham (2016) 3. Zlotogorski-Hurvitz, A., Dekel, B.Z., Malonek, D., Yahalom, R., Vered, M.z: FTIR-based spectrum of salivary exosomes coupled with computational-aided discriminating analysis in the diagnosis of oral cancer. J. Cancer Res. Clin Oncol. 145, 685–694 (2019) 4. Simonova, D., Karamancheva, I.: Application of Fourier transform infrared spectroscopy for tumor diagnosis. Biotechnol. Biotechnol. Equip. 27(6), 4200–4207 (2013) 5. Theophilou, G., Lima, K.M., Martin-Hirsch, P.L., Stringfellow, H.F., Martin, F.L.: ATR-FTIR spectroscopy coupled with chemometric analysis discriminates normal and malignant ovarian tissue of human cancer. R. Soc. Chem. 141, 585–594 (2016) 6. Paraskevaidi M., Martin-Hirsch P.L., Martin F.L.: ATR-FTIR spectroscopy tools for medical diagnosis and disease investigation. In: Kumar, C.S.S.R. (ed.) Nanotechnology Characterization Tools for Biosensing and Medical Diagnosis, pp. 163–211. Springer, Cham (2019) 7. Lei, L., Bi, X., Sun, H., Liu, S., Yu, M., Zhang, Y., Weng, S., Yang, L., Bao, Y., Wu L., Xu, Y., Shen K.: Characterization of ovarian cancer cells and tissues by Fourier transform infrared spectroscopy. J. Ovarian Res. 11, 64.1–64.10 (2018) 8. Dong, L., Sun, X., Chao, Z., Zhang, S., Zheng, J., Gurung, R., Du, J., Shi, J., Xu, Y., Zhang, Y., Wu, J.: Evaluation of FTIR spectroscopy as diagnostic tool for colorectal cancer using spectral analysis. Spectrochim Acta Part A Mol. Biomol. Spectrosc. 122, 288–294 (2014)
Artificial Neural Network in Predicting Cancer …
153
9. Rehman, S., Movasaghi, Z., Darr, J.A., Rehman, I.U.: Fourier transform infrared spectroscopic analysis of breast cancer tissues; identifying differences between normal breast, invasive ductal carcinoma, and ductal carcinoma in situ of the breast. Appl. Spectrosc. Rev. 45(5), 355–368 (2010) 10. Yang, H., Griffiths, P.R., Tate, J.D.: Comparison of partial least squares regression and multilayer neural networks for quantification of non-linear systems and application to gas phase Fourier transform infrared spectra. Anal. Chim. Acta 489, 125–136 (2003) 11. Lasch, P., Stämmler, M., Zhang, M., Baranska, M., Bosch, A., Majzner, K.: FT-IR hyperspectral imaging and artificial neural network analysis for identification of pathogenic bacteria. Anal. Chem. 90(15), 8896–8904 (2018) 12. Lasch, P., Diem, M., Hänsch, W., Naumann, D.: Artificial neural networks as supervised techniques for FT-IR microspectroscopic imaging. J. Chemom. 20(5), 209–220 (2006)
Evaluation of Shoulder Joint Data Obtained from CON-TREX Medical System Aleksandr Zotin , Konstantin Simonov , Evgeny Kabaev , Mikhail Kurako , and Alexander Matsulev
Abstract The study of the kinematic patterns, as well as, their variability in normal and pathological conditions is an urgent task and is actively carried out in biomechanics, rehabilitation, and sports medicine. One of aspects is the evaluation of data obtained in the framework of the use of robotic mechanotherapy at the CONTREX medical system during rehabilitation treatment of patients after arthroscopic reconstructive surgery on the shoulder joint. The work shows steps of processing and analyzing CON-TREX medical systems data. The analysis of statistics and modeling of data obtained during the exercise of the patient using the CON-TREX system are performed. For this, a correlation analysis of the data is applied and the Pearson correlation coefficients are calculated. The relationship between the variables represented by the data series is revealed. Improving the visual presentation for the entire set of CON-TREX clinical data with the help of approximations, as well as, within the framework of the histogram approach allows to increase the accuracy of diagnostic evaluations. The data analysis as part of a study of the dynamics shows a number of
A. Zotin Reshetnev Siberian State University of Science and Technology, Krasnoyarsky Rabochy pr 31, 660037 Krasnoyarsk, Russian Federation e-mail: [email protected] K. Simonov · A. Matsulev Institute of Computational Modeling SB RAS, 50/44 Akademgorodok, 660036 Krasnoyarsk, Russian Federation e-mail: [email protected] A. Matsulev e-mail: [email protected] E. Kabaev Center for Restorative Medicine of FSRCC FMBA Russian Federation, 25b Biathlonnaya st, 660041 Krasnoyarsk, Russian Federation e-mail: [email protected] M. Kurako (B) Siberian Federal University, 26 Kirensky st, 660074 Krasnoyarsk, Russian Federation e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_13
155
156
A. Zotin et al.
dependencies in indicators that allow to more accurately plan the exercise cycle for the patient.
1 Introduction An integral part of the modern rehabilitation process is the use of robotic mechanotherapy, whose role usually comes down to the routine “joint development” and restoration of its passive mobility. However, a number of robotic training complexes (CON-TREX, Primus RS, and Biodex) are equipped with additional capabilities for recording biomechanical parameters and self-adaptation to individual characteristics of patients due to Biological Feedback (BFB). In this regard, the patient is actively involved in the rehabilitation process becoming one of the links in the synchronized system “patient—robotic complex—doctor.” Studies of the kinematic patterns of the shoulder joint (scapular—shoulder) and the upper limb as a whole, as well as, their variability in norm and pathology are actively carried out in biomechanics, rehabilitation, and sports medicine [1– 3]. Injuries to the structures of the shoulder joint are quite common and account for 16–55% of all injuries of large joints [4]. The variability of changes in arthrokinematics in injuries of the shoulder joint depends on the nature and severity of violations of the integrity of its structures. The premorbid states that precede their damage are actively modeled and investigated [5]. CON-TREX training and diagnostic complex provides a large amount of data that allows more thoroughly to study the patient’s functional capabilities according to biomechanical indicators in dynamics reports. A comprehensive assessment of data during the workflow is time-consuming and requires additional specialization in biomechanics, which is often problematic and leads to a simplified, routine work on equipment, taking into account a small proportion of indicators (torque, power, and range of motion). As a result, a specialist does not get a complete picture of the processes that accompany the restoration or limitation of the functional activity of the patient, which makes the process of predicting the rehabilitation or training process less accurate. There is a need to provide this data to a specialist in a simpler and more convenient form, which in turn will increase the accuracy and timeliness of step-bystep diagnostics and correction of impaired functions based on statistical processing and visualization of experimental clinical data from CON-TREX. An integral part of the modern rehabilitation process is the use of robotic mechanotherapy, whose role usually comes down to the routine “joint development” and restoration of its passive mobility. The paper is organized as follows. Section 2 describes the reconstructive medicine development and its usage. Section 3 presents data description obtained from CONTREX medical system. The proposed approach of data evaluation is described in Sect. 4. Concluding remarks are outlined in Sect. 5.
Evaluation of Shoulder Joint Data Obtained from CON-TREX Medical System
157
2 Development of Reconstructive Medicine Minimally invasive reconstructive surgery of the shoulder joint is actively developing as a scientific and practical direction in modern traumatology and orthopedics [6, 7]. About 400 thousand arthroscopic operations on the shoulder joint per year are performed in the world (AAOS data, American Academy of Orthopedic Surgeons). The influence of arthroscopic reconstructive interventions on the restoration of the kinematic patterns of the shoulder joint is also actively studied in modern science, especially with the use of 3D technologies. The main role in these studies is given to the visual diagnosis of biomechanical disorders [8–10]. Robotic mechanotherapy at training complexes with biological feedback and an isokinetic dynamometer is used in rehabilitation and sports medicine as an effective means of developing joints after surgery, which can restore and improve specific central mechanisms of coordination of movements [11–13]. There is no single algorithm for postoperative comprehensive rehabilitation using isokinetic simulators. Reconstructive surgery of the shoulder joint is one of the actively developing scientific and practical areas of modern traumatology and orthopedics. Despite the widespread adoption of microsurgical techniques, the resulting pathology quite often leads to a persistent dysfunction of the shoulder joint which occurs in 23–29% of victims. Restorative medicine includes a number of sections with non-drug rehabilitation methods such as mechanotherapy. The first data on the successful clinical use of robotic systems for mechanotherapy appeared in 1997. Further development of robotics and computer technology with the introduction of virtual and game strategies opened up new prospects for the restoration of upper limb functions by activating biological feedback by attracting the patient’s personality. Since the beginning of 2000s, apparatuses of the so-called Continuous Passive Motion (CPM) therapy have been widely used in rehabilitation such as Artromot, Kinetec Centura, and Flex-mate. One of the modern, robotic means for carrying out the rehabilitation process is CONTREX treatment and diagnostic complex with biological feedback, which includes two modules—multijoint (MJ) module and a module for work simulation (WS). Biofeedback isokinetic simulators are diverse (Biodex, Cybex, and CON-TREX) and are regularly compared in scientific research. All of them make it possible to carry out the therapeutic and diagnostic process during classes in passive (CPM) and active modes (isotonic, isometric, and isokinetic), in the absence of axial load on the segment [14–16]. The studies of dynamic profiles are the work of foreign sports doctors [17, 18].
3 CON-TREX Medical System Data Description The study was conducted on the basis of the Center for Restorative Medicine of FSRCC FMBA of Russia (Krasnoyarsk, Russia). The objects of observation were men and women 18–55 years old in the early and late recovery periods after surgical
158
A. Zotin et al.
(a)
(b)
Fig. 1 Modules of CON-TREX medical system: a MJ module, b WS module
treatment of injuries of the shoulder joint (rupture of the rotational cuff of the shoulder, traumatic dislocation of the shoulder damage to Bankart and Hill-Sachs). In total, 100 patients were included in the study. Patients of the comparison group (50 people) received standard treatment, consisting of a set of physical therapy exercises, physiotherapeutic procedures, and massage. Another 50 patients of the group additionally received courses of robotic mechanotherapy at CON-TREX medical diagnostic complex with biological feedback. CON-TREX equipment (Fig. 1) allows the mobilization of joints in the direction of flexion/extension, abduction/reduction and rotation with programming, tracking, and control of biomechanical parameters (force, torque, amplitude of movement in the joint, power, etc.) forming an electronic protocol of dynamic observation. Robotic mechanotherapy with CON-TREX simulators is possible in continuous passive mobilization modes (CPM therapy), as well as, in isokinetic and isotonic modes with the concentric and eccentric type of resistance at different speeds. Work in the isotonic and isokinetic modes actively involves a patient in movement with the given parameters (speed and torque) [19, 20]. CON-TREX system captures the parameters that determine the conditions for the exercise and calculates many statistical indicators that characterize the physical parameters of the exercise by the patient. The software of the medical system visualizes the basic data (Fig. 2). The graph of the torque (N m) versus time (sec) is shown in the upper left area, the graph below shows the position (deg) versus time (sec), and the figure on the right shows graphs of all 50 cycles of the dependencies of the torque (N m) on the position (deg). As part of the assessment, 14 parameters are calculated for 50 complete cycles: torque peaks, speed peaks, works, average power, power peaks, torque peak positions, and speed peak positions in clockwise and counterclockwise directions. These parameters (avg and max values) are automatically determined by CON-TREX system for each cycle of a given movement and are accessible in a final report.
Evaluation of Shoulder Joint Data Obtained from CON-TREX Medical System
159
Fig. 2 Data visualization plane of software tool
4 CON-TREX Data Evaluation The use of computational technology for the analysis of multidimensional data is presented for solving a specific problem related to the processing, visualization, and statistical analysis of experimental clinical data from CON-TREX system. Assessment of data can be conducted during three steps. At first, the preliminary data processing and the formation of models are carried. This allows to low the dimension for analyzing the structure of experimental data using a nonlinear regression. Next step is a data visualization to assess the condition and arthrokinematics of the shoulder joint using the histogram method. Thirdly, correlation analysis and color visualization of the dynamics of the obtained data are carried out. Description of steps is presented in Sects. 4.1–4.3.
4.1 Data Pre-processing and Model Generation Approximation models for CON-TREX data are constructed by nonlinear multiparametric regression method. The main stages of regression computation are preprocessing, regression calculation, smoothness assessment, optimization, and crossvalidation check. At the pre-processing stage, the x i value is centered based on the value εi , which is inversely proportional to the confidence interval:
xi = xi −
εi xi . i εi i
(1)
160
A. Zotin et al.
(a)
(b)
14
input data
-2
approximation
12
-3
10
-4
8
-5
6
-6
input data
approximation
-7
4 1
6
11
16
21
26
31
36
41
51
46
56
61
66
71
76
81
86
91
96
Fig. 3 Data approximation for “maximum torque” indicator: a clockwise, b counterclockwise
In addition to centering, normalization of indicators is performed using the standard deviation (σ x ): xi =
xi . σx
(2)
The basic formula for calculating the linear part of the regression with adjustable parameters b, c, w, φ, (b and c are determined during pre-processing), is ait = bi + ci
j
sin φi j +
w jk X kt ,
(3)
k
where X is input values and ait is ith output for problem t. Figure 3 depicts an example of CON-TREX data approximation for the “maximum torque” indicator for clockwise and counterclockwise movements. During comparative analysis the approximation models allow to study the features of integrated use of mechanotherapy and its impact on the regression rate and the dynamics of the severity and frequency of clinical symptoms of the pathological process in different periods. These models also make possible to quantify the effectiveness of the mechanotherapy impact on the functional activity of the shoulder join. Lowering the dimension of the data and building approximation models allow us to evaluate the course of rehabilitation according to the advising components based on the trend function and in more detail based on the oscillation component.
4.2 Data Visualization and Interpretation Based on a Histogram Approach Multidimensional CON-TREX data visualization is implemented using the histogram method. This method is based on the construction of an experimental distribution of the observed values of the studied quality indicator—distribution histograms. Form, position, and magnitude of scattering of the histogram allow to
Evaluation of Shoulder Joint Data Obtained from CON-TREX Medical System
161
Fig. 4 A histogram-shaped visualization with a division into two components of the Gaussian line with parameters (shift, half-width)
evaluate the effectiveness of treatment. For convenience of analysis, the “forward” and “reverse” movements of the patient’s arm are considered separately. Figure 4 shows the frequency histogram for all values of the recorded efforts with the division into zones of exercise. The horizontal axis characterizes the magnitude of the effort, and the vertical axis shows the number of points in each of 200 partition intervals for the entire duration of the exercise. Analysis of the histogram indicates that in the “reverse” movement of the hand (a sharper and more intense peak on the left, blue circles) small efforts develop and they are quite uniform. Direct movement (the wider peak on the right is green circles) has a greater variety and the most typical force is larger in magnitude than the largest developed in the reverse movement. The visualization of CON-TREX data based on the construction of two-dimensional histograms is shown in Fig. 5. The abscissa shows time in fractions for each half-cycle from 0 to 1, along the ordinate axis “force” (“torque”) in fractions from the minimum (0) to maximum value (1). Visualization of the results of statistical analysis (variance estimation over time) of CON-TREX data is presented in Fig. 6. The measurement was performed as a percentage of the range of efforts (max–min) taking into account the difference between the min and max values for the clockwise and counterclockwise movements. Improving the visual presentation in the form of variability of the standard deviation for each indicator during rehabilitation exercises allows to increase the accuracy of diagnostic estimates of the current arthrokinematic condition of the patient’s shoulder joint.
(a)
(b)
Fig. 5 Histograms of “effort” versus time for half-cycles: a clockwise, b counterclockwise
162
A. Zotin et al.
(a)
(b)
Fig. 6 RMS deviations of “torque” for movements: a clockwise, b counterclockwise
4.3 Evaluation of Dynamics The study solves the problem of processing and analyzing patient test data using CON-TREX to assess the dynamic component of the condition of the shoulder joint. As an example, we consider an exercise that is performed with the left hand according to CPM program 5 (continuous passive movement clockwise and counterclockwise). For estimation, correlation between the time series is used (Pearson correlation coefficient). The correlation coefficients between the time series represented by the values X t and Y t are calculated as
X t − X¯ Yt − Y¯ 1 1 ¯ Corr(X, Y ) = X t , Y¯ = Yt . 2 2 , where X = n n t ¯ ¯ t − X − Y X Y t t t (4) t
According to the obtained data, the correlation between clockwise and counterclockwise movements is very weak ~0.2 because forward and backward movements are performed by different muscles. The exception is the parameters related to the peak power, for which the correlation is 0.633—this is a strong correlation. Figure 7 shows the correlation matrix for 14 parameters calculated from the data of 50 complete exercise cycles and its color representation in rainbow scheme. Visualization of quantitative assessment of histograms is performed using rainbow scheme (Fig. 8) as well as color-coded correlation matrices (Fig. 9). The correlation matrices depicted in Fig. 9 show that the correlations between subsequent cycles decay toward the end of the exercise. An interesting feature is that for the first half of the cycle at the end of the exercise, the correlation between repetitions again increases which looks like a kind of square in which red points prevail in Fig. 9. This means that from 20 to 45 cycles for clockwise movement the force profile is approximately the same and is repeated. Currently, as part of testing, dynamics assessment is performed by constructing a map based on correlation matrices, which are formed for each of the 14 parameters. Final map takes into account the importance of indicators (weigh). The weight
Evaluation of Shoulder Joint Data Obtained from CON-TREX Medical System
163
Fig. 7 An example of constructing a correlation matrix
Fig. 8 3D graphs representation of time series of torque
Fig. 9 Correlation matrices between the first and second halves of 50 exercise cycles
determined according to share, which for “power peak” is 50%, “torque peak” and “speed peak” are 20% and 15%, respectively, other indicators have an equal share. The created set of color-coded correlation matrices based on indicators from dynamic reports allows to characterize a specific functional model of a joint at a
164
A. Zotin et al.
certain stage of the recovery process. The obtained estimates are important for timely correction of treatment tactics and expertise. It becomes possible to determine more precisely the phase of patient rehabilitation. A detailed study of these reports provides an in-depth understanding of the biomechanics and physiology of the musculoskeletal system as well as improving the accuracy of diagnosis.
5 Conclusions As a part of the research, we performed a numerical simulation of experimental clinical data obtained by CON-TREX medical system. Approximation models are constructed and data visualized by the histogram method. Models and indicators in the normal and pathological states at different stages of rehabilitation or testing allow to assess the direction of functional remodeling of the joint and the degree of its arthrokinematics stabilization. Improving the visual presentation for the entire set of initial experimental clinical data of CON-TREX using neural networks approximations and based on a histogram approach allows to increase the accuracy of diagnostic estimates by interpreting statistical indicators, both in the kinematics and dynamics of the rehabilitation process. The dependencies in indicators allow more accurately to plan the exercise cycle for a patient.
References 1. Mulla, D.M., McDonald, A.C., Keir, P.J.: Upper body kinematic and muscular variability in response to targeted rotator cuff fatigue. Hum. Mov. Sci. 59, 121–133 (2018) 2. Escamilla, R.F., Yamashiro, K., Paulos, L., Andrews, J.R.: Shoulder muscle activity and function in common shoulder rehabilitation exercises. Sports Med. 39(8), 663–685 (2009) 3. Lefèvre-Colau, M.M., Nguyen, C., Palazzo, C., Srour, F., Paris, G., Vuillemin, V., Poiraudeau, S., Roby-Brami, A., Roren, A.: Kinematic patterns in normal and degenerative shoulders. Part II: Review of 3-D scapular kinematic patterns in patients with shoulder pain, and clinical implications. Ann. Phys. Rehabil. Med. 61(1), 46–53 (2018) 4. Tashjian, R.Z., Farnham, J.M., Albright, F.S., Teerlink, C.C., Cannon-Albright, L.A.: Evidence for an inherited predisposition contributing to the risk for rotator cuff disease. J. Bone Joint Surg. Am. 91(5), 1136–1142 (2009) 5. Chopp-Hurley, J.N., O’Neill, J.M., McDonald, A.C., Maciukiewicz, J.M., Dickerson, C.R.: Fatigue-induced glenohumeral and scapulothoracic kinematic variability: implications for subacromial space reduction. J. Electromyogr. Kinesiol. 29, 55–63 (2016) 6. Godin, J., Sekiya, J.K.: Systematic review of arthroscopic versus open repair for recurrent anterior shoulder dislocations. Sports Health 3(4), 396–404 (2011) 7. Gyftopoulos, S., Bencardino, J., Palmer, W.E.: MR imaging of the shoulder: first dislocation versus chronic instability. Semin. Musculoskelet. Radiol. 16(4), 286–295 (2012) 8. Owens, B.D., Dawson, L., Burks. R. et al.: Incidence of shoulder dislocation in the United States military: demographic considerations from a high-risk population. J. Bone Joint Surg. Am. 91(4), 791–796 (2009)
Evaluation of Shoulder Joint Data Obtained from CON-TREX Medical System
165
9. Kolk, A., Henseler, J.F., de Witte, P.B., van Zwet, E.W., van der Zwaal, P., Visser, C.P.J., Nagels, J., Nelissen, R.G.H.H., de Groot, J.H.: The effect of a rotator cuff tear and its size on three-dimensional shoulder motion. Clin. Biomech. 45, 43–51 (2017) 10. Kolk, A., de Witte, P.B., Henseler, J.F., van Zwet, E.W., van Arkel, E.R., van der Zwaal, P., Nelissen, R.G., de Groot, J.H.: Three-dimensional shoulder kinematics normalize after rotator cuff repair. J. Shoulder Elbow Surg. 25(6), 881–889 (2016) 11. Koseev, A., Christova, P.: Discharge pattern of human motor units during dynamic concentric and eccentric contractions. Electroencephalogr. Clin. Neurophysiol. 109(3), 245–255 (1998) 12. Edouard, P., Bankolé, C., Calmels, P., Beguin, L., Degache, F.: Isokinetic rotator muscles fatigue in glenohumeral joint instability before and after Latarjet surgery: a pilot prospective study. Scand. J. Med. Sci. Sports 23(2), 74–80 (2013) 13. Baray, A.L., Philippot, R., Farizon, F., Boyer, B., Edouard, P.: Assessment of joint position sense deficit, muscular impairment and postural disorder following hemi-Castaing ankle ligamentoplasty. Orthop. Traumatol. Surg. Res. 100(6), 271–274 (2014) 14. CON-TREX system MJ&WS: Manual. Physiomed (2011) 15. Cotte, T., Ferret, J.-M.: Comparative study of two isokinetics dynamometers: CYBEX NORM vs CON-TREX MJ. Isokinetics Exerc. Sci. 11, 37–43 (2003) 16. Edouard, P., Damotte, A., Lance, G., Degache, F., Calmels, P.: Static and dynamic shoulder stabilizer adaptations in javelin thrower. Isokinetics Exerc. Sci. 21, 47–55 (2013) 17. Larrat, E., Kemoun, G., Carette, P., Teffaha, D., Dugue, B.: Isokinetic profile of knee flexors and extensors in a population of rugby players. Ann. Readapt. Med. Phys. 50(5), 280–286 (2007) 18. Andrade, M.S., Fleury, A.M., de Lira, C.A., Dubas, J.P., da Silva, A.C.: Profile of isokinetic eccentric-to-concentric strength ratios of shoulder rotator muscles in elite female team handball players. J. Sports Sci. 28(7), 743–749 (2010) 19. Johansson, F.R., Skillgate, E., Lapauw, M.L., Clijmans, D., Deneulin, V.P., Palmans, T., Engineer, H.K., Cools, A.M.: Measuring eccentric strength of the shoulder external rotators using a handheld dynamometer: reliability and validity. J. Athl. Train. 50(7), 719–725 (2015) 20. Papotto, B.M., Rice, T., Malone, T., Butterfield, T., Uhl, T.L.: Reliability of isometric and eccentric isokinetic shoulder external rotation. J. Sport Rehabil. 25(2), 46.1–46.7 (2016)
Framework for Intelligent Wildlife Monitoring Valery Nicheporchuk , Igor Gryazin , and Margarita N. Favorskaya
Abstract In this research, we suggest a framework for the detail wildlife monitoring based on video surveillance. Camera traps are located on the remote territories of natural parks in the habitats of wild animals and birds. In spite of the dominant connectivity and sensing technology for wildlife monitoring are based on the wireless sensor networks, such technology cannot be applied in some cases due to vast impassable territories, especially in Siberian part of Russia. Based on our previous investigations in this research topic, we propose the main approaches and methods for big data collection, processing, and analysis useful for the management of natural parks and any wildlife habitat.
1 Introduction In environment monitoring, special attention is paid for natural parks aiming to observe the ecological system in the wild, study the behavior and population change of animals and birds including the endangered species, and evaluate the impact of natural human activities on the ecological system. V. Nicheporchuk (B) · M. N. Favorskaya Institute of Computational Modeling, Siberian Branch of the RAS, 50, Academgorodok, Krasnoyarsk 660036, Russian Federation e-mail: [email protected] M. N. Favorskaya e-mail: [email protected] I. Gryazin Ergaki Natural Park, 42, Rossiiskauya, Ermakovskoye, Krasnoyarsky kray 662821, Russian Federation e-mail: [email protected] V. Nicheporchuk · I. Gryazin · M. N. Favorskaya Reshetnev Siberian State University of Science and Technology, 31 Krasnoyarsky Rabochy ave, Krasnoyarsk 660037, Russian Federation © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_14
167
168
V. Nicheporchuk et al.
In order to preserve and study the biodiversity and regulate the impact of a human on the ecological system, more than ten thousand federal and regional specially protected natural areas have been created in Russia. The national and natural parks are especially important for ecological monitoring. The use of traditional methods for studying the behavior of wild animals and birds in Siberia is difficult due to vast impassable territories, severity of the climate, and difficult terrain. The lack of objective indicators hinders the implementation of effective environmental actions and reduces the quality of long-time forecasts for ecological systems. To increase the level of information support, we suggest technologies for the automated collection and analytical processing of ecological data as an alternative to the traditional human-based methods of environmental monitoring. In this paper, we offer the main approaches and methods for information support of ecological systems on the example of Ergaki Natural Park, Krasnoyarsky kray, Russia. The paper is organized as follows: Section 2 reviews related work. Section 3 presents a description of the current state and ecological monitoring in Ergaki Natural Park. Methods and materials are discussed in Sect. 4. Finally, we conclude the paper in Sect. 5.
2 Related Work Conventional wildlife monitoring methods include five approaches with different degree of automated collection and processing: wildlife survey by staffs of national parks, invasive monitoring based on Global Positioning System (GPS) collars without receiving image information about animals [1], employment of optical/infrared cameras (camera traps) with storage visual information in the local large-capacity Secure Digital (SD) memory card [2], application of expensive wireless cameras to transmit images through wireless broadband microwave [3], and integrated monitoring based on satellite remote sensing [4]. A choice of monitoring type depends on the goal of investigation, species and behavior of animals, natural properties of territories (vegetation, seasons, remoteness, and accessibility), and possible technical equipment for surveillance. In recent years, many investigations are focused on the animal monitoring and tracking in large wild areas due to this is the initial data for ecological monitoring. Zviedris et al. [5] offered the improved animal monitoring sensor system (for lynxes and wolves tracking) and low-level software for sensor node control and communication. LynxNet system was based on tracking collars, built around TMote Mini sensor nodes, sensors, GPS and 433 MHz radio, and stationary base stations, placed at the locations that are frequented by the animals. Xu et al. [6] proposed an animal monitoring by utilizing wireless sensor networks and Unmanned Aerial Vehicle (UAV). It was offered to collect different types of information such as picture, sound, and odor from the sensor nodes using UAV under the proposition that the wildlife animals have their own habitats and can stay at a certain location for rest or just having activities in a small area. These authors proposed a
Framework for Intelligent Wildlife Monitoring
169
path planning approach for UAV based on the Markov decision process model and believed that their technology was applicable to detect the locations of endangered species and implement their tracking in large-scale wildlife areas. Zhang et al. [7] promoted an idea of a combination of macro and micro-monitoring work in ecological monitoring applying Geographic Information System (GIS) and remote sensing technology for the spatial observation, positioning, and analysis as the advanced macro-ecological technical means. The ground monitoring as a micro-monitoring part of the generalized ecological system also has great importance due to the detailed big data, especially if multimedia information on the scale of 24/7/365 is collected. Sometimes, the equipment cannot transmit the received information in online mode. In that case, we have an off-line possibility to analyze big data stored in the local large-capacity memory devices. In spite of many drawbacks such as lagged data collection, long monitoring cycle, and high risk, the staffs of natural parks obtain such information that cannot be collected by other macro-monitoring means. Our goal is to decrease high labor during data processing and help in decision-making. In recent years, big data in wildlife monitoring have been collected that requires the development of machine learning methods for automated animal recognition, deep learning in particular. These methods are applied not only for noninvasive surveillance, but also for detection of conflicts between animals and humans [8]. Different deep learning architectures for animal detection and recognition in the wild were proposed in recent researches [9–12]. A brief review demonstrates the high scientific interest in this scope.
3 Current State and Ecological Monitoring in Ergaki Natural Park Ergaki Natural Park is located in the mountains of Western Sayan at the junction of Siberian taiga territories with the dry continental steppes of Central Asia that determines the richness of flora and fauna. There are 10 species of mammals, 52 species of birds, and 164 species of plants and mushrooms listed in the Red Books of Russia and Krasnoyarsky kray. The uniqueness and relative accessibility of mountain landscapes attract a large number of tourists and athletes, over 50 thousand people in the summer and up to 45 thousand in the winter when ski slopes are working. The intensive tourist flows are the factor of anxiety for animals and birds, leading to damage to vegetation and pollution of soils by household garbage. In addition, to provide the safety of tourists, it is necessary to control the forest fire and exclusion of human contacts with large predators. Mentioned above factors distinguish Ergaki Natural Park from other protected areas of Siberia. The map of Ergaki Natural Park with the current locations of camera traps is depicted in Fig. 1.
170
V. Nicheporchuk et al.
Fig. 1 Map of Ergaki Natural Park with the locations of camera traps
Planning of the territory observation until 2030 was developed by the Directorate of Ergaki Natural Park. Around 50 camera traps will be distributed through the territory, especially along the highway with GPS coverage (see Fig. 1). This will provide the possibility for remote transmission of images from camera traps to the intelligent ecological system in online mode, which allows to explain and model the dynamics of ecological systems under human impact. The map shows the locations of the camera traps in different landscape areas. The choice of places is determined by the behavior of the controlled species of animals and birds. For ungulates, special feeding places are arranged, which also attract predators. Note that recognition accuracy is increased when several devices make a shooting of one place from different positions. Use of camera traps that record events in places of animals and birds habitats allows to get unique information that is not available by other monitoring devices. A typical low-cost camera trap makes a series of consecutive images trough 3– 5 s or a video of short duration if any movement in a scene is detected by a motion sensor (passive infrared). For object/objects appearance, a series may include a dozen
Framework for Intelligent Wildlife Monitoring
171
snapshots. The object can be animal, bird, or person. However, some meteorological factors may cause additional movement in a scene, for example, the swinging branches under strong wind, or luminance conditions can be far from ideal. The urgent task is to analyze the accumulated big data obtained from camera traps and create a framework for intelligent monitoring of ecological systems in Krasnoyarsky kray. A variety of landscapes (taiga, steppe, and highlands) requires a selection of special devices. At the stage of image analysis, it is necessary to solve problems as removing the non-informative frames, identifying the species and individuals (if it is possible), and extracting events from a series of images. Application of modern technologies and methods for big data processing allows to obtain new knowledge about the state of ecological systems.
4 Methods and Materials In this section, we consider the main methods and materials for wildlife ecological monitoring. Section 4.1 discusses the applied methods for data obtaining and processing. The architecture of the information system is considered in Sect. 4.2. The results of ecological monitoring of Ergaki Natural Park are offered in Sect. 4.3.
4.1 Methods for Data Obtaining and Processing The process of monitoring activities to support the ecological systems of the protected areas is presented in Fig. 2. Flow-chart describes 12 processes. • A model for monitoring the ecological system is based on the analysis of observational data usually since the founding of protected areas. The main monitoring results are a description of the dynamics of a number of animals, birds, and plant species, as well as, cases of the occurrence of rare and endangered species. In addition, research on threats of ecological systems is conducted, such as biologically unjustified and illegal logging, forest and steppe wildfires, and irrational human economic activities. • The planning of ecological observations includes the winter routes of mammals and birds, and organization of visitor monitoring. The camera traps are located at the salt licks, watering places, crossings over watercourses, and migration routes. The obtained results can be a cause of changes in applied methods. • Locations of camera traps and animal and bird routes are visualized in a form of thematic layers of digital maps in various scales (from M1:1,000,000 to M1:10,000). GIS technologies allow to simulate some processes (for example, forest fires) and conduct the cartographic analysis. • Based on the spatial model of the territory and monitoring results in previous years, location of camera traps can be changed based on the expert estimates.
172
Fig. 2 Flow-chart of ecological monitoring methods design and testing
V. Nicheporchuk et al.
Framework for Intelligent Wildlife Monitoring
•
• •
•
•
•
173
To obtain statistically reliable data, the observation series with duration 12 or more years is required. Thus, new observation places are added to the existing places. Image processing and data representation in a parametric form are the most complex issues. The recognition of animal species in the wildlife using camera traps have many challenges caused by the shooting conditions (illumination, weather impact, seasons, and cluttered background) and animal or bird behavior (unpredicted movement, multiple shapes and poses, and occlusions by natural objects) [13–15]. Methodologies and regulatory documents for the general requirements for data collection, route density, and the number of camera trap locations do not elaborate in Russia. The sufficiency of the observations is only estimated by experts. 7 and 8. Data integration is implemented using the data warehouse with multilevel structure [16]. A flexibility of storage facilities allows the use of the dynamic visualization of results. Since full-scale observations in Ergaki Natural Park are just beginning, the “lightweight” tools for intelligent analysis are used. Quantitative criteria determine the completeness of the ecological system’s description. The objective monitoring allows to clarify a number of large predators and ungulates. For example, thanks to 30 camera traps located in the Stolby Wildlife Sanctuary the number of bears “increased” from 30 to 72 individuals. Recommendations are formed using the production knowledge base. Firstly, the protection measures that are common to the large country regions are formalized. Then, the knowledge base oriented on the local ecological system is implemented. The crucial task is to design the editor software tool adapted for use by the staff. The decisions are made to change some aspects of monitoring and environmental measures, contributing to more efficient observation and control in the protected areas.
4.2 Architecture of Information System The generalized system architecture is depicted in Fig. 3. It includes the following components: data sources, consolidation of information resources, software tools and services of data processing, and human–machine interfaces [16]. The integrated image processing obtained from camera traps with other data allows to solve the following analytical tasks of environmental monitoring: • Inventory of the species means the identification of species based on their attendance at feeding and watering places. • Study of habitat is based on migration analysis of large animals and their comparison to the long-term data on habitat, weather conditions, and feed base. • Abundance and density are calculated based on the number and survival of ungulates and other large mammals in the pregnancy and after calving. • Monitoring the feed regimes of rare bird chicks is assessed by indirect factors (for example, the time and duration of feed base visiting).
174
V. Nicheporchuk et al.
Fig. 3 System architecture
• Interspecies interactions are estimated as the ecological system balance, capacity of feed base, and migration patterns based on a number of predators and their potential victims. • Assessment of threats and influence of a human factor on the development of biodiversity of ecological systems is conducted according to the long-term observations of a human load on the protected areas.
Framework for Intelligent Wildlife Monitoring
175
Human–machine interfaces provide easy data access for the staffs making the executive decisions. Specialized workstations are focused on different user groups. The proposed information system is in the scalability of the system architecture, including information resources, technologies, and software. This makes it possible to adapt to the analytical system of various management tasks. The software modules implement the primary image processing and analytical modeling functions, as well as, visualization for decision-making. Table 1 The number of animals by the end of the winter season 2017–2018 Species
Density, individuals/1000 ha
Parameters Number of individuals
Data sources Dynamicsa
Winter routes are considered
Camera traps
Ungulates Elk
0.21
47
→
+
+
Maral
1.44
316
↑
+
+
Roe deer
1.13
247
→
+
+
Musk deer
0.5
113
→
+
+
1.22
390
→
–
+
5…8
→
+
+
Lynx
0.018
4
↑
Wolverine
0.037
8
→
+
Fox
0.45
98
↓
+
+
Sable
3.8
835
→
+
+
Boar
0.39
85
↓
Predators Bear Wolf
+
Otter
22
–
Mink
396
–
Beaver
56
+
+
Rodents Squirrel
11.38
2493
–
–
Hare
6.6
1453
↓
+
+
Capercaillie
4.12
903
↑
+
Grouse
100
21900
↓
+
From additional sources
Birds
a↑—a number is increased, →—a number is stable, ↓ n—a number is decreased
176
V. Nicheporchuk et al.
4.3 The Results of Ecological Monitoring of Ergaki Natural Park Monitoring of mammals and birds by various methods on the territory of 342,873 ha allowed to detect 10 species of mammals and 52 species of birds listed in the Red Book of the Krasnoyarsky kray. Fauna inventory data are shown in Table 1. Inventory activities included 20 routes over 120 km and visual observations on the area of 15,000 ha. To feed animals, mixtures of grasses, oats, and sunflowers were sown. Several salt licks and places of feeding were equipped. The positive impact of the environmental regime on the conservation of species diversity is confirmed by a number of interesting faunistic findings. For example, in the steppe zone, the increased number of birds is reliably confirmed using camera traps.
5 Conclusions Based on formalizing the environmental protection activities and application of new technologies for intelligent data processing, a concept for monitoring environmental systems has been developed. Camera traps provide unique information about the state and development of ecological systems. The analysis of accumulated images and the creation of common information resources are the tasks of the first priority in the ecological monitoring system. In this research, a systematization of monitoring data has been done, and the structure of information resources respect to the protected Siberian areas has been proposed in order to improve the environmental protection. Acknowledgements The reported study was funded by Russian Foundation for Basic Research, Government of Krasnoyarsk Territory, Krasnoyarsk Regional Fund of Science, to the research project No 18-47-240001.
References 1. Frair, J., Nielsen, S., Merrill, E., Lele, S., Boyce, M., Munro, R., Stenhouse, G., Beyer, H.: Removing GPS collar bias in habitat selection studies. J. Appl. Ecol. 41(2), 201–212 (2004) 2. Burton, C.A., Neilson, E., Moreira, D., Ladle, A., Steenweg, R., Fisher, J.T., Bayne, E., Boutin, S.: Wildlife camera trapping: a review and recommendations for linking surveys to ecological processes. J. Appl. Ecol. 52, 675–685 (2015) 3. Zhang, J., Luo, X., Chen, C., Liu, Z., Cao, S.: A wildlife monitoring system based on wireless image sensor networks. Sens. Trans. 180(10), 104–109 (2014) 4. Tibbetts, J.H.: Remote sensors bring wildlife tracking to new level: trove of data yields fresh insights–and challenges. Bioscience 67(5), 411–417 (2017) 5. Zviedris, R., Elsts, A., Strazdins, G., Mednis, A., Selavo, L.: LynxNet: wild animal monitoring using sensor networks. In: Marron, P.J., Voigt, T., Corke, P., Mottola, L. (eds.) Real-World Wireless Sensor Networks: 4th International Workshop, LNCS, vol. 6511, pp. 170–173. Colombo, Sri Lanka (2010)
Framework for Intelligent Wildlife Monitoring
177
6. Xu, J., Solmaz, G., Rahmatizadeh, R., Turgut, D., Boloni, L.: Internet of things applications: Animal monitoring with unmanned aerial vehicle. In: 40th Annual IEEE Conference on Local Computer Networks, pp. 125–132 Florida, USA (2015) 7. Zhang, J., Zhang, J., Du, X., Hou, K., Qiao, M.: An overview of ecological monitoring based on geographic information system (GIS) and remote sensing (RS) technology in China. In: IOP Conf. Series: Earth and Environmental Science 94, 012056.1–012056.4 (2017) 8. Madheswaran, K.M.S., Veerappan, K., Kumar, S.V.: Region based convolutional neural network for human-elephant conflict management system. In: International Conference on Computational Intelligence in Data Science, pp. 1–5. Chennai, India (2019) 9. Nguyen, H., Maclagan, S.J., Nguyen, T.D., Nguyen, T., Flemons, P., Andrews, K., Ritchie, E.G., Phung, D.: Animal recognition and identification with deep convolutional neural networks for automated wildlife monitoring. In: IEEE International Conference on Data Science and Advanced Analytics, pp. 40–49. Tokyo, Japan (2017) 10. Norouzzadeh, M.S., Nguyen, A., Kosmala, M., Swanson, A., Palmer, M.S., Packer, C., Clune, J.: Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning. Proc. Natl. Acad. Sci. U.S.A. 115(25), E5716–E5725 (2018) 11. Chen, R., Little, R., Mihaylova, L., Delahay, R., Cox, R.: Wildlife surveillance using deep learning methods. Ecol. Evol. 9(17), 9453–9466 (2019) 12. Favorskaya, M., Pakhirka, A.: Animal species recognition in the wildlife based on muzzle and shape features using joint CNN. Procedia Comput. Sci. 159, 933–942 (2019) 13. Favorskaya, M., Buryachenko, V.: (2019) Selecting informative samples for animal recognition in the wildlife. In: Czarnowski, I., Howlett, R., Jain, L. (eds.) Intelligent Decision Technologies SIST, vol. 143, pp. 65–75. Springer, Singapore (2019) 14. Favorskaya, M.N., Buryachenko, V.V.: Background extraction method for analysis of natural images captured by camera traps. Inf. Control Syst. 6, 35–45 (2018) 15. Zotin, A.G., Proskurin, A.V.: Animal detection using a series of images under complex shooting conditions. In: ISPRS—International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences. XLII-2/W12, pp. 249–257 (2019) 16. Nicheporchuk, V.V., Penkova, T.G., Gryazin, I.V.: Structuring the information resources for intelligent ecosystem monitoring system based on camera traps. In: 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics, pp. 1449–1454. Opatija, Croatia (2019)
Computation the Bridges Earthquake Resistance by the Grid-Characteristic Method Alena Favorskaya
Abstract The use of supercomputer technologies to determine the earthquake resistance of structures is relevant in connection with construction in earthquake-prone regions. In this paper, to determine the seismic resistance of bridges, it is proposed to use a novel grid-characteristic method on systems of combined separate conformal structured regular and curvilinear computational grids in order to reduce the cost of computing resources. This numerical method allows to take into account the features of the propagation and re-reflection of seismic waves within the structure, however, it requires the use of substantially detailed computational grids and detailed time discretization. Therefore, the challenge of the cost of computing resources is acute even for two-dimensional calculations. The challenge of constructing the used computational grids also arises. The paper describes in detail the approach to construct these computational grids. In particular, the proposed analytical expressions are presented that allow one to reduce computational resources for constructing curvilinear structured computational grids and ensure their conformity. As test examples, the earthquake stability of bridges over a river and bridges over a highway was calculated. The design parameters of the bridges were varied, the impact of the water level and river width on the nature of the damage was investigated.
1 Introduction Numerical methods for calculating the earthquake stability of various structures are actively developed today. Usually, software packages ABAQUS and ANSYS are used A. Favorskaya (B) Moscow Institute of Physics and Technology, 9 Institutsky Lane, Dolgoprudny, Moscow Region 141700, Russian Federation e-mail: [email protected] National Research Centre “Kurchatov Institute”, 1 Akademika Kurchatova Pl., Moscow 123182, Russian Federation Scientific Research Institute for System Analysis of the Russian Academy of Sciences, 36(1) Nahimovskij Av., Moscow 117218, Russian Federation © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_15
179
180
A. Favorskaya
to determine the earthquake resistance of structures [1, 2] based on the finite element method. Also, finite element and finite difference methods are used in [3–6]. In [7] a novel numerical methodology based on finite element or finite difference methods adopted to evaluate the seismic stability of extended levee network was proposed. Anomalies in seismic stability analysis of slopes in soil and rock were investigated in [8] using the simplest sliding stability analysis–the single plane failure model, also known as the Culmann model. Seismic stability of columns was studied in [9] using the equation of motion. The obtained results in [9] also have implications in the design of tall bridge piers, in which the concept of rocking isolation is used [10, 11]. The seismic stability of tunnels is discussed in [12]. In this work, to assess the seismic resistance of bridges, the grid-characteristic numerical method on structured curvilinear grids was used [13, 14]. The gridcharacteristic method was successfully applied to solve computational problems of seismic stability of different structures in [15–18]. This paper is organized as follows: In Sect. 2, the types of computational grids are discussed. Section 3 presents the used mathematical model and solved boundaryvalue problem. The results of numerical modeling are introduced in Sects. 4 and 5. Section 6 concludes the paper.
2 Computational Grids To describe the complex geometric shape of the bridges and minimize the cost of computing resources, the coverage of the integration domain with conformal combined computational grids was used. Two types of structured meshes were used, i.e., regular and curved ones. The strengths and weaknesses of using these two types of grids are summarized in Table 1. The benefit of using computing resources as a result of using regular grids is obtained through a simpler computational algorithm for calculating the solution on a regular grid. Examples of covering computational domains with systems of separate conformal structured computational grids are shown in Figs. 1 and 2. Each figure shows the fragment of the integration domain, illustrating the principle of covering it with Table 1 Strengths and weaknesses of using structured curved and regular computational grids Type of structured computational grids
Strengths
Weaknesses
Regular
Reduced computing costs, simpler and more accurate numerical methods
Do not give the opportunity to cover objects of complex geometric shape
Curvilinear
The ability to cover objects of complex geometric shape
Complicate the algorithm of the numerical method and increase the computation time, the accuracy of the numerical method decreases
Computation the Bridges Earthquake Resistance …
181
Fig. 1 Bridge over a river: separate computational grids
Fig. 2 Bridge over a highway: separate computational grids
separate computational grids. The use of separate structured regular computational grids allows locally reducing the spatial step without using curved computational grids. Regular grids are marked in gray in Figs. 1 and 2, and curved grids are colored. To reduce the cost of computational resources for constructing curvilinear structured computational grids, as well as to ensure their conformity with neighboring computational grids, the analytical expressions given in Sects. 2.1 and 2.2 were proposed. The described approach used can be generalized to the three-dimensional case.
2.1 Trapezoid with Straight Boundaries In this subsection, the curved structured computational grids shown in Fig. 3a and marked by yellow and turquoise colors in Fig. 1 are discussed. Points A, B, C, D have coordinates (Xl , Yl ), l ∈ {A, B, C, D}. The coordinates xi, j , yi, j of the computational grid point with indices (i, j) are found by the following formulae, i ∈ [0, NX ], j ∈ [0, NY ]: (X B − X A ) (YB − YA ) , YAB = , NX − 1 NX − 1 (X C − X D ) (YC − YD ) , YDC = = NX − 1 NX − 1
X AB = X DC
(1)
182
A. Favorskaya
Fig. 3 Curvilinear structured computational grids: a trapezoid with straight boundaries, b trapezoid with analytical parameterized curved boundary and horizontal bottom line, c trapezoid with analytical parameterized curved boundary and positive slope bottom line, d trapezoid with analytical parameterized curved boundary and negative slope bottom line
X A,i = X A + i · X AB , YA,i = YA + i · YAB , X D,i = X D + i · X DC , X D,i − X A,i YD,i − YA,i , YAD,i = = NY − 1 NY − 1
(2)
YD,i = YD + i · YDC , X AD,i
xi, j = X A,i + j · X AD,i , yi, j = YA,i + j · YAD,i
(3) (4)
2.2 Trapezoid with Analytical Parameterized Curved Boundary In this subsection, the curved structured computational grids shown in Fig. 3b–d and marked by violet color in Fig. 2 are discussed. Strictly speaking, the figure in question is not a trapezoid. Points A, B, C, D have coordinates (X A , YA ), (X B , YB ), (X B , YC ), and .., respectively. So, the following parameters are given to construct the computational gird: • Positive real number α ∈ [0, 1]. • Integer numbers NX ≥ 2, NY ≥ 2. • Real numbers X A , YA , YD , X B , YB , YC . In examples shown in Fig. 3b–d YA = YB, YA < YB , and YA > YB , respectively. For all examples shown in Fig. 3b–d α = 1 7, NX = 8, NY = 5. Note that X in A range 0, X B −X might be given instead of α. 2 The coordinates xi, j , yi, j of the computational grid point with indices (i, j) are found by the following formulae, i ∈ [0, NX ], j ∈ [0, NY ]: X =
j · X (1 − α)(X B − X A ) , X j = 2 NY − 1
(5)
Computation the Bridges Earthquake Resistance …
YA, j =
j · (YD − YA ) j · (YC − YB ) + YA , YB, j = + YB NY − 1 NY − 1
183
(6)
YB, j − YA, j kj i · (X B − X A ) + XA , wj = , X j > 0, xi, j = X B − X A − X j 2X j NX − 1 (7) ⎧ 2 ⎪ YA, j + w j · xi, j − X A , xi, j ∈ X A , X A + X j , X j > 0 ⎪ ⎪ ⎪ ⎨ X j , xi, j ∈ X A + X j , X B − X j . (8) = YA, j + k j · xi, j − X A − ⎪ 2 ⎪ ⎪ ⎪ 2 ⎩ YB, j − w j · xi, j − X B , xi, j ∈ X B − X j , X B , X j > 0
kj =
yi, j
Similarly, one can consider other functions for the upper and lower (or left and right) boundaries of the grids that smoothly transition into each other when a certain parameter changes. In the considered example this parameter is X j .
3 Mathematical Model Accordingly, in each of the separated computational grids, depending on the medium under consideration, either an acoustic wave equation (in subdomains with water) [18] ρ
∂ v(r, t) = −∇ p(r, t), ∂t
∂ p(r, t) = −ρc2 (∇ · v(r, t)), ∂t
(9) (10)
either elastic wave equation (in other subdomains) ρ
∂ v(r, t) = (∇ · σ(r, t))T , ∂t
(11)
∂ σ(r, t) = ρcP2 − 2ρcS2 (∇ · v(r, t))I + ρcS2 ∇ ⊗ v(r, t) + (∇ ⊗ v(r, t))T , ∂t (12) were solved. Hereinafter, v(r, t) is the velocity vector field (derivative of the displacement on time), p(r, t) is the scalar pressure field, σ(r, t) is the tensor field of the symmetric Cauchy stress tensor of the second rank, c, cP , cS are the speed of sound in an acoustic medium, the longitudinal (P-) and transverse (S-) waves in an elastic medium, respectively, ρ is density, I is unit tensor of the second rank, ⊗ means the
184
A. Favorskaya
Table 2 Elastic and acoustic parameters of the considered materials
Medium
P-wave speed, m/s
S-wave speed, m/s
Density, kg/m3
Concrete
4,250
2,125
2,300
Rock
2,300
1,400
2,500
Water
1,500
−
1000
tensor product of vectors, (a ⊗ b)i j = ai b j . The used parameters c, cP , cS , and ρ are presented in Table 2. The computational grids used were with a spatial step 5 times smaller than those presented in Figs. 1 and 2. In accordance with the stability conditions, the time step was taken to be 9.4 µs (9 µs), and the total calculation time was 0.47 s (0.5 s), which corresponds to 50,001 (55,556) time steps for the bridge over a river (over a highway). Plane P- and S-waves with a wavelength 50 m (sin wavelet), velocity magnitude of 0.25 m/s, and an angle between the wave front and the surface of 15 degrees were considered as a source and were set as initial conditions. The principal stress criterion and the fracture model [17, 18] were used to model the dynamical destruction of bridges. The critical principal stress was taken equal to 2.5 MPa in the bridges’ beams and 2 MPa in the bridges’ piers. At the boundaries of the computational grids with air, the condition of the free boundary, if the system of Eqs. 11–12 was solved inside the grid or the condition of zero pressure, if the system of Eqs. 9–10 was solved inside the grid were set, respectively: σ · m = 0, p = 0
(13)
At the remaining boundaries of the computational grids, nonreflecting boundary conditions were set [13]. Between two computational grids, in which the system of Eqs. 11–12 was solved, the contact condition of complete adhesion was set σ1 · m = σ1 · m, v1 = v2
(14)
On the contacts between the computational grids, in which the systems of Eqs. 9– 10 and of Eqs. 11–12 were solved, the following contact condition was set: σ · m + pm = 0, vA · m = vE · m
(15)
In Eq. 15, indices 1, 2 or A, E corresponds to different computational grids. In Eq. 13, m is out normal to the boundary of the grid, in Eq. 14, m is out normal to the boundary of the grid No 1, in Eq. 15, m is out normal to the boundary of the grid in which elastic wave Eqs. 11–12 are solved.
Computation the Bridges Earthquake Resistance …
185
Fig. 4 Bridge over a highway, initial conditions of P-wave, final destructions: a geometry variant No 1, b geometry variant No 2
Fig. 5 Bridge over a highway, initial conditions of S-wave, final destructions: a geometry variant No 1, b geometry variant No 2
4 Bridge Over a Highway: Simulation Results Final destructions are shown in Fig. 4 for the initial conditions of P-wave, and in Fig. 5 for the initial conditions of S-waves. Figures 4a, b and 5a, b correspond to equal geometry of computations.
5 Bridge Over a River: Influence of Water Level and Design Parameters Final destructions are shown in Figs. 6, 7, 8, and 9a for initial conditions of P-wave, and in Figs. 9b, 10, 11, and 12 for the initial conditions of S-waves. Figures 6, 10, 7, 11, 8, 12, and 9a, b correspond to equal to each other geometry, respectively.
Fig. 6 Bridge over a river, initial conditions of P-wave, geometry variant No 1
Fig. 7 Bridge over a river, initial conditions of P-wave, geometry variant No 2
Fig. 8 Bridge over a river, initial conditions of P-wave, geometry variant No 3
186
A. Favorskaya
Fig. 9 Bridge over a river, geometry variant No 4: a initial conditions of P-wave, b initial conditions of S-wave
Fig. 10 Bridge over a river, initial conditions of S-wave, geometry variant No 1
Fig. 11 Bridge over a river, initial conditions of S-wave, geometry variant No 2
Fig. 12 Bridge over a river, initial conditions of S-wave, geometry variant No 3
To determine the impact of the water level, the wave fields and the dynamics of destruction were compared for the models presented in Figs. 6, 10 and in 7 and 11 (water levels 12 m and 4 m, respectively). Wave fields and fracture dynamics were compared for the models shown in Figs. 6, 10 and in Figs. 8 and 12 (the numbers of supports 9 and 11, respectively) to determine the influence of design parameters. Wave fields and fracture dynamics were compared for the models shown in Figs. 6, 10, and in Fig. 9 (the width of the river 445 m and 166 m, respectively) to determine the influence of river width.
6 Conclusions The obtained results demonstrate the effectiveness of the approach using combined structured regular and curvilinear conformal separate computational grids to reduce the cost of computing resources. Also, the obtained results show that the gridcharacteristic method is applicable for solving problems of determining the earthquake resistance of bridges and allows a comparative analysis of the design features at the development stage. Acknowledgements This work has been performed at Moscow Institute of Physics and Technology with the financial support of the Russian Science Foundation, grant no. 17-71-20088. This work has been carried out using computing resources of the federal collective usage center Complex for Simulation and Data Processing for Mega-science Facilities at NRC “Kurchatov Institute”, http:// ckp.nrcki.ru/.
Computation the Bridges Earthquake Resistance …
187
References 1. Yaghin, M.L., Hesari, M.A.: Dynamic analysis of the arch concrete dam under earthquake force with ABAQUS. J. Appl. Sci. 8(15), 2648–2658 (2008) 2. Xunqiang, Y., Jianbo, L., Chenglin, W., Gao, L.: ANSYS implementation of damping solvent stepwise extraction method for nonlinear seismic analysis of large 3-D structures. Soil Dyn. Earthquake Eng. 44, 139–152 (2013) 3. Ozutsumi, O., Sawada, S., Iai, S., Takeshima, Y., Sugiyama, W., Shimazu, T.: Effective stress analyses of liquefaction-induced deformation in river dikes. Soil Dyn. Earthquake Eng. 22, 1075–1082 (2002) 4. Oka, F., Tsai, P., Kimoto, S., Kato, R.: Damage patterns of river embankments due to the 2011 off the Pacific Coast of Tohoku earthquake and a numerical modeling of the deformation of river embankments with a clayey subsoil layer. Soils Found. 52(5), 890–909 (2012) 5. Boulanger, R.W., Montgomery, J., Ziotopoulou, K.: Nonlinear deformation analyses of liquefaction effects on embankment dams. In: Ansal, A., Sakr, M. (eds.) Perspectives on Earthquake Geotechnical Engineering, pp. 247–283. Springer, Switzerland (2015) 6. Boulanger, R.W., Montgomery, J.: Nonlinear deformation analyses of an embankment dam on a spatially variable liquefiable deposit. Soil Dyn. Earthquake Eng. 91, 222–233 (2016) 7. Cravero, J., Elkady, A., Lignos, D.G.: Experimental evaluation and numerical modeling of wide-flange steel columns subjected to constant and variable axial load coupled with lateral drift demands. J. Struct. Eng. 146(3), 04019222.1–04019222.19 (2019) 8. Christian, J.T., Urzúa, A.: Anomalies in pseudostatic seismic stability analysis. J. Geotechn. Geoenviron. Eng. 143(5), 06017001.1–06017001.3 (2017) 9. Makris, N., Kampas, G.: Size versus slenderness: two competing parameters in the seismic stability of free-standing rocking columns. Bull. Seismol. Soc. Am. 106(1), 104–122 (2016) 10. Makris, N.: The role of the rotational inertia on the seismic resistance of free-standing rocking columns and articulated frames. Bull. Seismol. Soc. Am. 104, 2226–2239 (2014) 11. Makris, N., Vassiliou, M. F.: Are some top-heavy structures more stable? J. Struct. Eng. 140(5), 06014001.1–06014001.5 (2014) 12. Pan, Q., Dias, D.: Three-dimensional static and seismic stability analysis of a tunnel face driven in weak rock masses. Int. J. Geomech. 18(6), 04018055.1–04018055.10 (2018) 13. Favorskaya, A.V., Zhdanov, M.S., Khokhlov, N.I., Petrov, I.B.: Modeling the wave phenomena in acoustic and elastic media with sharp variations of physical properties using the gridcharacteristic method. Geophys. Prospect. 66(8), 1485–1502 (2018) 14. Golubev, V.I., Petrov, I.B., Khokhlov, N.I.: Numerical simulation of seismic activity by the grid-characteristic method. Comput. Math. Math. Phys. 53(10), 1523–1533 (2013) 15. Favoskaya, A.V., Petrov, I.B.: Calculation the earthquake stability of various structures using the grid-characteristic method. Radioelektronika, Nanosistemy, Informacionnye Tehnologii 11(2), 345–350 (2019) 16. Favorskaya, A., Golubev, V., Khokhlov, N.: Two approaches to the calculation of air subdomains: theoretical estimation and practical results. Procedia Comput. Sci. 126, 1082–1090 (2018) 17. Breus, A., Favorskaya, A., Golubev, V., Kozhemyachenko, A., Petrov, I.: Investigation of seismic stability of high-rising buildings using grid-characteristic method. Procedia Comput. Sci. 154, 305–310 (2019) 18. Favorskaya, A.V., Breus, A.V., Galitskii, B.V.: Application of the grid-characteristic method to the seismic isolation model. In: Petrov, I., Favorskaya, A., Favorskaya, M., Simakov, S., Jain, L. (eds.) Smart Modeling for Engineering Systems. GCM50 2018, SIST, vol. 133, pp. 167–181. Springer, Cham (2019)
Study the Elastic Waves Propagation in Multistory Buildings, Taking into Account Dynamic Destruction Alena Favorskaya
and Vasily Golubev
Abstract The paper is devoted to the study of the influence of the presence of multiple voids in an object (rooms in a high-rising building) on the passage of longitudinal and transverse elastic waves. A feature of the research is to take into account the influence of the dynamical destruction of an object on the propagation of elastic waves in it. Changes in wavelength, wave type, amplitude, and dispersion are analyzed. The studies were performed using the method of wave phenomena investigation called Wave Logica. We solved the boundary-value problem of the elastic wave equation using the grid-characteristic numerical method on regular structured computational grids to compute the wave fields. We used the failure criterion for the principal stress and the destruction model of fractures to calculate the dynamical destruction. The results obtained can be of practical importance in the development of methods for increasing the earthquake stability of the multistory buildings.
1 Introduction Nowadays, the height of multistory buildings is increasing more and more [1]. Moreover, their construction is carried out, including in seismically active regions. Therefore, there is a need to develop and apply numerical methods for calculating their A. Favorskaya (B) · V. Golubev Moscow Institute of Physics and Technology, 9 Institutsky lane, Dolgoprudny, Moscow 141700, Russian Federation e-mail: [email protected] V. Golubev e-mail: [email protected] A. Favorskaya National Research Centre “Kurchatov Institute”, 1 Akademika Kurchatova pl., Moscow 123182, Russian Federation A. Favorskaya · V. Golubev Scientific Research Institute for System Analysis of the Russian Academy of Sciences, 36(1) Nahimovskij av., Moscow 117218, Russian Federation © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_16
189
190
A. Favorskaya and V. Golubev
earthquake resistance. Finite element or finite difference methods are often used [2]. The paper [3] discusses the calculation of the seismic resistance of multistory buildings using the sound-vibration method. The pseudostatic method was employed to model seismic loadings on tunnel faces in [4]. The work [5] is devoted to the study of the seismic resistance of multistory buildings using the finite element method. Records of real earthquakes are used as an earthquake model. The seismic stability of columns is discussed in [6]. In this paper, we applied the grid-characteristic numerical method [7, 8]. This numerical method was parallelized [9] and successfully used to solve computational problems in different scientific areas, e.g., seismic prospecting [10, 11], delamination of composite materials [12], and seismic stability investigation [13–15]. This paper is organized as follows: Section 2 presents the used mathematical model. The calculated wave patterns compared with each other are introduced in Sects. 3 and 4 in the cases of P- and S-waves, respectively. The obtained results are summarized in Sect. 5. Section 6 concludes the paper.
2 Mathematical Models We solved the boundary-value problem of elastic wave equation [7, 8, 16] with the boundary condition of a given density of external force [7, 8] using the gridcharacteristic method. Everywhere, except for the lower boundary, the force was zero, and at the lower boundary, it specified a longitudinal (P-) or transverse (S-) waves with amplitude of 0.25 m/s and varying length. This approach simulates the study of earthquake resistance of a structure on a vibration platform [17]. Four variants of geometry were considered. Model 0 was a parallelogram 73 m × 124 m. Models 1–3 correspond to buildings with parameters given in Table 1. The wall and roof thicknesses were 1 m, the foundation thickness was 4 m for all Models 1–3. These geometry parameters were chosen in a way, in which the total height and width are equaled to 124 m and 73 m, respectively, for all Models 0–3. The space step was 0.1 m for Model 0 and 0.025 m for Models 1–3. The elastic parameters were cP = 4, 250 m/ s, cS = 2, 125 m/ s, and ρ = 2, 300 kg/m3 In accordance with the stability conditions, the time step was 23.5 µs for Model 0 and 5.875 µs for Models 1–3. Total integration time was 0.47 s, which corresponds to 20,001 time steps for Model 0 and 80,001 time steps for Models 1–3. The principal Table 1 Buildings parameters for Models 1–3 Model
Number of floors Number of rooms per a floor Ceiling height, m Room width, m
Model 1 30
12
3
5
Model 2 30
8
3
8
Model 3 20
8
5
8
Study the Elastic Waves Propagation in Multistory Buildings …
191
stress criterion with critical stress of 2 MPa and the fracture model [13–15] were used to model dynamical destruction.
3 Wave Patterns: Influence of P-Wave Figures 1, 2, 3, 4, 5, 6, 7, and 8 show the wave patterns of velocity module in the case of P-wave with varying wavelength and with or without destruction. In all Figs. 1, 2, 3, 4, 5, 6, 7, and 8, (a) is Model 0, (b) is Model 1, (c) is Model 2, and (d) is Model 3. Notice that the full integration domain is shown in Fig. 2. Figures 1, 3, 4, 5, 6, 7, and 8 show only a bottom part of the integration domain.
(a)
(b)
(c)
(d)
Fig. 1 P-wave, length 100 m, time moment 11.75 ms, without destruction: a Model 0, b Model 1, c Model 2, d Model 3
(a)
(b)
(c)
(d)
Fig. 2 P-wave, length 100 m, time moment 23.5 ms, without destruction: a Model 0, b Model 1, c Model 2, d Model 3
192
A. Favorskaya and V. Golubev
(a)
(b)
(c)
(d)
Fig. 3 P-wave, length 100 m, time moment 11.75 ms, with destruction: a Model 0, b Model 1, c Model 2, d Model 3
(a)
(b)
(c)
(d)
Fig. 4 P-wave, length 75 m, time moment 9.4 ms, without destruction: a Model 0, b Model 1, c Model 2, d Model 3
(a)
(b)
(c)
(d)
Fig. 5 P-wave, length 75 m, time moment 18.8 ms, without destruction: a Model 0, b Model 1, c Model 2, d Model 3
(a)
(b)
(c)
(d)
Fig. 6 P-wave, length 100 m, time moment 23.5 ms, with destruction: a Model 0, b Model 1, c Model 2, d Model 3
Study the Elastic Waves Propagation in Multistory Buildings …
(a)
(b)
(c)
193
(d)
Fig. 7 P-wave, length 75 m, time moment 9.4 ms, with destruction: a Model 0, b Model 1, c Model 2, d Model 3
(a)
(b)
(c)
(d)
Fig. 8 P-wave, length 75 m, time moment 18.8 ms, with destruction: a Model 0, b Model 1, c Model 2, d Model 3
(a)
(b)
(c)
(d)
Fig. 9 S-wave, length 100 m, time moment 21.15 ms, without destruction: a Model 0, b Model 1, c Model 2, d Model 3
4 Wave Patterns: Influence of S-Wave Figures 9, 10, 11, 12, 13, 14, 15, and 16 show the wave patterns of velocity module in the case of S-wave with varying wavelength and with or without destruction. In all Figs. 9, 10, 11, 12, 13, 14, 15, and 16, (a) is Model 0, (b) is Model 1, (c) is Model 2, and (d) is Model 3. Notice that the full integration domain is shown in Figs. 10, 12, and 9, 11, 12, 13, 14, 15, and 16 show only a bottom part of the integration domain.
194
A. Favorskaya and V. Golubev
(a)
(b)
(c)
(d)
Fig. 10 S-wave, length 100 m, time moment 44.65 ms, without destruction: a Model 0, b Model 1, c Model 2, d Model 3
(a)
(b)
(c)
(d)
Fig. 11 S-wave, length 100 m, time moment 21.15 ms, with destruction: a Model 0, b Model 1, c Model 2, d Model 3
(a)
(b)
(c)
(d)
Fig. 12 S-wave, length 100 m, time moment 44.65 ms, with destruction: a Model 0, b Model 1, c Model 2, d Model 3
Study the Elastic Waves Propagation in Multistory Buildings …
(a)
(b)
(c)
195
(d)
Fig. 13 S-wave, length 75 m, time moment 16.45 ms, without destruction: a Model 0, b Model 1, c Model 2, d Model 3
(a)
(b)
(c)
(d)
Fig. 14 S-wave, length 75 m, time moment 35.25 ms, without destruction: a Model 0, b Model 1, c Model 2, d Model 3
(a)
(b)
(c)
(d)
Fig. 15 S-wave, length 75 m, time moment 16.45 ms, with destruction: a Model 0, b Model 1, c Model 2, d Model 3
(a)
(b)
(c)
(d)
Fig. 16 S-wave, length 75 m, time moment 35.25 ms, with destruction: a Model 0, b Model 1, c Model 2, d Model 3
196
A. Favorskaya and V. Golubev
5 Analysis of Changes in Wavelength, Amplitude, and Dispersion Changes in wavelength in % from the incident wavelength are given in Tables 2, 3, 4, and 5. The rate of wave attenuation is shown in the Tables 6, 7, 8, and 9: weak (W), middle (M), and significant (S) attenuation, respectively. In order to estimate the dispersion, the difference between the attenuation of wavelengths for cases of different incident wavelengths (75 and 100 m) normalized by the average value was calculated in %, it is shown in the Tables 6, 7, 8, and 9. Positive value means that the wavelength with Table 2 P-wave (without destruction) changes in wavelength, % Model
100 m, 0
100 m, 1
100 m, 2
75 m, 0
75 m, 1
75 m, 2
Model 0
100
106
77
97
103
80
Model 1
57
54
61
57
57
59
Model 2
38
41
39
55
50
39
Model 3
59
50
47
71
54
39
Table 3 P-wave (with destruction) changes in wavelength, % Model
100 m, 0
100 m, 1
100 m, 2
75 m, 0
75 m, 1
75 m, 2
Model 0
97
105
97
97
103
71
Model 1
53
55
61
60
58
48
Model 2
59
47
44
56
48
56
Model 3
70
52
59
83
56
52
Table 4 S-wave (without destruction) changes in wavelength, % Model
100 m, 0
100 m, 1
100 m, 2
75 m, 0
75 m, 1
75 m, 2
Model 0
91
89
88
91
91
88
Model 1
39
27
37
44
30
38
Model 2
36
23
38
46
27
38
Model 3
46
22
45
46
26
42
Table 5 S-wave (with destruction) changes in wavelength, % Model
100 m, 0
100 m, 1
100 m, 2
75 m, 0
75 m, 1
75 m, 2
Model 0
95
85
85
98
82
85
Model 1
33
28
23
42
31
38
Model 2
24
22
26
46
26
34
Model 3
19
22
28
42
27
35
Study the Elastic Waves Propagation in Multistory Buildings …
197
Table 6 P-wave (without destruction) changes in amplitude and dispersion Model
Amplitude, 0, 100/75 m
Amplitude, 1, 100/75 m
Amplitude, 2, 100/75 m
Dispersion, 0, %
Dispersion, 1, %
Dispersion 2, %
0
W/W
W/W
W/W
−2
−3
3
1
W/W
W/W
W/W
0
5
2
W/W
S/M
W/W
36
19
3
W/W
M/M
W/W
18
8
−3 0 −20
Table 7 P-wave (with destruction) changes in amplitude and dispersion Model
Amplitude, 0, 100/75 m
Amplitude, 1, 100/75 m
Amplitude, 2, 100/75 m
0
W/W
W/W
S/S
1
W/W
W/M
S/S
2
M/M
M/M
M/M
3
M/M
M/M
M/M
Dispersion, 0, %
Dispersion, 1, %
Dispersion 2, %
0
−2
−30
12
4
−24
−6
2
23
18
7
−13
Table 8 S-wave (without destruction) changes in amplitude and dispersion Model
Amplitude, 0, 100/75 m
Amplitude, 1, 100/75 m
Amplitude, 2, 100/75 m
Dispersion, 0, %
Dispersion, 1, %
Dispersion 2, %
0
W/W
W/W
W/W
0
3
0
1
W/W
W/W
W/W
12
8
4
2
W/W
W/W
W/W
24
14
0
3
M/M
M/S
S/S
−2
17
−7
Table 9 S-wave (with destruction) changes in amplitude and dispersion Model
Amplitude, 0, 100/75 m
Amplitude, 1, 100/75 m
Amplitude, 2, 100/75 m
Dispersion, 0, %
Dispersion, 1, %
Dispersion 2, %
0
W/W
M/M
S/S
3
−4
0
1
S/S
S/S
S/S
25
10
51
2
S/S
S/S
S/S
63
19
28
3
S/S
S/S
S/S
78
22
22
a higher frequency decreases in the building less than the wavelength with a lower frequency. Note that in Tables 6, 7, 8, and 9, the numbers 0, 1, 2 correspond to the measurement approach: 0 according to the first half-wave period at the time points shown in Figs. 1, 3, 5, 7, 9, 11, 13, 15; 1, 2 for the first and second half-periods of the wave at the time instants shown in Figs. 2, 4, 6, 8, 10, 12, 14, 16, respectively. Note that there is a significant distortion of the wave front, therefore, an accurate estimate of the change in wavelength, amplitude, and dispersion is difficult.
198
A. Favorskaya and V. Golubev
6 Conclusions Based on the results of the research, the following conclusions can be drawn. The reduction in wavelength occurs mainly due to the following two factors: • The occurrence of a scattered head wave (in the case of propagation of P-wave) or Rayleigh wave (in the case of propagation of S-wave). • Re-reflection of all arisen types of waves inside the walls of the building and the occurrence of reflected multiple PP-, PS-, SP-, and SS-waves. Accordingly, each novel fracture that appeared during the dynamical destruction behaves like a novel boundary, with the movement of P- or S-waves along which head waves or Rayleigh waves arise, respectively, and from which all the arisen types of waves are also reflected. Therefore, the presence of dynamic destruction causes a significant decrease in the wave amplitude and a further decrease in the wavelength. Acknowledgments The reported study was funded by RFBR according to the research project № 18-01-00526. This work has been carried out using computing resources of the federal collective usage center Complex for Simulation and Data Processing for Mega-science Facilities at NRC “Kurchatov Institute”, http://ckp.nrcki.ru/.
References 1. Lu, X., Lu, X., Guan, H., Ye, L.: Collapse simulation of reinforced concrete high-rise building induced by extreme earthquakes. Earthquake Eng. Struct. Dynam. 42(5), 705–723 (2013) 2. Maharjan, M., Takahashi, A.: Liquefaction-induced deformation of earthen embankments on non-homogeneous soil deposits under sequential ground motions. Soil Dyn. Earthquake Eng. 66, 113–124 (2014) 3. Mo, H., Hou, H., Nie, J., Tian, L., Wu, S.: Seismic performance analysis of high-rise steelconcrete composite structures under earthquake action based on sound-vibration method. Int. J. New Dev. Eng. Soc. 3(2), 302–308 (2019) 4. Pan, Q., Dias, D.: Three-dimensional static and seismic stability analysis of a tunnel face driven in weak rock masses. Int. J. Geomechanics 18(6), 04018055.1–04018055.10 (2018) 5. Azghandi, R.R., Shakib, H., Zakersalehi, M.: Numerical simulation of seismic collapse mechanisms of vertically irregular steel high-rise buildings. J. Construct. Steel Res. 166, 105914.1–105914.16 (2020) 6. Makris, N., Kampas, G.: Size versus slenderness: two competing parameters in the seismic stability of free-standing rocking columns. Bull. Seismol. Soc. Am. 106(1), 104–122 (2016) 7. Favorskaya, A.V., Zhdanov, M.S., Khokhlov, N.I., Petrov, I.B.: Modeling the wave phenomena in acoustic and elastic media with sharp variations of physical properties using the gridcharacteristic method. Geophys. Prospect. 66(8), 1485–1502 (2018) 8. Favorskaya, A.V., Petrov, I.B.: Grid-characteristic method. In: Favorskaya, A.V., Petrov, I.B. (eds.) Innovations in Wave Modeling and Decision Making, SIST, vol. 90, pp. 117–160. Springer, Switzerland (2018) 9. Ivanov, A.M., Khokhlov, N.I.: Efficient inter-process communication in parallel implementation of grid-characteristic method. In: Petrov, I., Favorskaya, A., Favorskaya, M., Simakov, S., Jain, L. (eds.) Smart Modeling for Engineering Systems. GCM50 2018, SIST, vol. 133, pp. 91–102. Springer, Cham (2019)
Study the Elastic Waves Propagation in Multistory Buildings …
199
10. Stognii, P.V., Khokhlov, N.I.: 2D seismic prospecting of gas pockets. In: Petrov, I., Favorskaya, A., Favorskaya, M., Simakov, S., Jain, L. (eds.) Smart Modeling for Engineering Systems. GCM50 2018, SIST, vol. 133, pp. 156–166. Springer, Cham (2019) 11. Muratov, M.V., Petrov, I.B.: Application of fractures mathematical models in exploration seismology problems modeling. In: Petrov, I., Favorskaya, A., Favorskaya, M., Simakov, S., Jain, L. (eds.) Smart Modeling for Engineering Systems. GCM50 2018, vol. 133, pp. 120–131 (2019) 12. Beklemysheva, K.A., Vasyukov, A.V., Petrov, I.B.: Numerical modeling of delamination in fiber-metal laminates caused by low-velocity impact. In: Petrov, I., Favorskaya, A., Favorskaya, M., Simakov, S., Jain, L. (eds.) Smart Modeling for Engineering Systems. GCM50 2018, SIST, vol. 133, pp. 132–142. Springer, Cham (2019) 13. Favoskaya, A.V., Petrov I.B.: Calculation the earthquake stability of various structures using the grid-characteristic method. Radioelektronika, Nanosistemy, Informacionnye Tehnologii 11(2), 345–350 (in Russian) (2019) 14. Breus, A., Favorskaya, A., Golubev, V., Kozhemyachenko, A., Petrov, I.: Investigation of seismic stability of high-rising buildings using grid-characteristic method. Procedia Comput. Sci. 154, 305–310 (2019) 15. Favorskaya, A.V., Breus, A.V., Galitskii, B.V.: Application of the grid-characteristic method to the seismic isolation model. In: Petrov, I., Favorskaya, A., Favorskaya, M., Simakov, S., Jain, L. (eds.) Smart Modeling for Engineering Systems. GCM50 2018, SIST, vol. 133, pp. 167–181. Springer, Cham (2019) 16. Zhdanov, M.S.: Geophysical Inverse Theory and Regularization Problems. Elsevier (2002) 17. Chung, Y.L., Nagae, T., Hitaka, T., Nakashima, M.: Seismic resistance capacity of high-rise buildings subjected to long-period ground motions: E-Defense shaking table test. J. Struct. Eng. 136(6), 637–644 (2010)
Icebergs Explosions for Prevention of Offshore Collision: Computer Simulation and Analysis Alena Favorskaya
and Nikolay Khokhlov
Abstract The paper is devoted to the numerical simulation of mechanical destruction of icebergs under man-made intense explosive influences. Icebergs pose a significant threat to offshore facilities, offshore oil platforms, ships, and bottom pipelines. One way to reduce the probability of collision is to detonate the iceberg. This work is dedicated to the calculation of the iceberg destruction as a result of this detonation. However, as a result of this detonation, the iceberg can split into large fragments, which will also pose a threat to offshore facilities. Therefore, it is important to conduct experiments to determine the parameters of the effect on the iceberg, which affects the size and number of fragments. Since it is difficult to conduct high-precision fullscale experiments, it is necessary to develop and test numerical methods to solve the problems of this class. In this paper, the grid-characteristic method was used for this purpose. Test calculations were carried out and patterns between the parameters of the explosive man-made impact and the nature of the destruction of the iceberg were revealed.
A. Favorskaya (B) · N. Khokhlov Moscow Institute of Physics and Technology, 9 Institutsky Lane, Dolgoprudny, Moscow Region 141700, Russian Federation e-mail: [email protected] N. Khokhlov e-mail: [email protected] A. Favorskaya National Research Centre “Kurchatov Institute”, 1 Akademika Kurchatova Pl., Moscow 123182, Russian Federation A. Favorskaya · N. Khokhlov Scientific Research Institute for System Analysis, Russian Academy of Sciences, 36(1) Nahimovskij Av, Moscow 117218, Russian Federation © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_17
201
202
A. Favorskaya and N. Khokhlov
1 Introduction In connection with the active development of natural resources on the shelf zones of the Arctic and Antarctic, the urgent problem is the threat of collision of offshore facilities with ice formations. The main scientific work in this area in recent years has been focused on the study of melting and mechanical destruction of icebergs and their subsequent drift [1, 2]. Both full-scale experiments and the development of numerical methods and software systems are underway [3–8]. The mechanical properties of ice under shear [9–11] and compression loads [12–14] are also being studied. The most popular way to reduce the probability of an iceberg colliding with a stationary offshore object is to tow the icebergs [15]. In this area, both experimental and numerical studies are underway [15–17]. An alternative to towing an iceberg is its man-made destruction. It is important to correctly calculate the parameters of the dynamic impact, since splitting the iceberg into large fragments will only increase the probability of an unwanted collision [18]. Attention to the man-made destruction of icebergs began with the creation of methods for producing freshwater [19]. Subsequently, studies were conducted on the destruction of ice objects in river channels to ensure the passage of ships [20]. This work is devoted to the application of the grid-characteristic method [21–23] for modeling the destruction of icebergs under intense man-made influences in order to identify patterns that relate the impact parameters and the nature of the iceberg split. The grid-characteristic method has been successfully used to solve direct [24, 25] and inverse problems [26] seismic exploration, including the presence of water layer and ice formations [27, 28], as well as to calculate the destruction of various structures as a result of mechanical stress [29–32]. This paper is organized as follows: Section 2 presents the used mathematical model and solved boundary-value problem. The influence of the surrounding water on iceberg dynamical destruction is discussed in Sect. 3. The impact of explosion parameters is discussed in Sect. 4. Section 5 is about the influence of the geometry of iceberg. Section 6 concludes the paper.
2 Mathematical Models The joint boundary-value problem of elastic [22, 23] and acoustic wave equations [22, 23, 29] is solved ρ
∂ ∂ v = (∇ · σ)T , σ = ρc2P − 2ρc2S (∇ · v)I + ρc2S ∇ ⊗ v + (∇ ⊗ v)T (1) ∂t ∂t ρ
∂ ∂ v = −∇ρ, ρ = −ρc2 (∇ · v) ∂t ∂t
(2)
Icebergs Explosions for Prevention of Offshore Collision…
a
b
203
c
d
Fig. 1 Mathematical models, boundary and contact conditions: a point source, surrounding water; b point source, without surrounding water; c plane wave, surrounding water; d plane wave, without surrounding water
Equation 1 is the elastic wave equation. Equation 2 is the acoustic wave equation. Two types of problem statements are considered: taking into account the enclosing water layer (Fig. 1a, c) and without taking into account the water layer (Fig. 1b, d). The following boundary and contact conditions were set: non-reflecting boundary conditions (dashed line in Fig. 1), based on the equality of the outgoing characteristics to zero [22, 23], the contact condition between ice and water (black line in Fig. 1) [27, 28] σ · m + pm = 0, vA · m = vE · m
(3)
the free boundary condition (red line in Fig. 1) [22, 23] σ · m = 0, the condition of zero pressure (green line in Fig. 1) [22, 23] p = 0, and the condition of the given density of external force (blue line in Fig. 1) [22, 23] σ·m=
0 f (t)
(4)
with the following time dependence ⎧ N ⎪ ⎪ ⎨ f 0 sin(2π ηt), t ∈ 0, η f (t) = ⎪ N ⎪ ⎩ 0, t∈ / 0, η
(5)
In Eqs. 1–5 v(r, t) is velocity, p(r, t) is pressure, σ(r, t) is Cauchy stress tensor, c, cP , cS are the speed of sound in an acoustic medium, the pressure (P-) and shear (S-) waves in an elastic medium, respectively, ρ is density, m is outgoing normal to the boundary of elastic subdomain, I is unit tensor of the second rank, η is the frequency of the source of intense impact, N is the number of periods, f 0 is the amplitude, ⊗ means the tensor product of vectors, (a ⊗ b)i j = ai b j . Note that the external force condition is specified either at several points on the upper boundary of the iceberg (Fig. 1a, b) or on the entire upper surface of the iceberg (Figs. 1c, d).
204
A. Favorskaya and N. Khokhlov
Table 1 Elastic and acoustic parameters of the considered media
Medium
P-wave speed, m/s
S-wave speed, m/s
Ice
3,940
2,493
Water
1,500
–
Density, kg/m3 917 1,025
Elastic and acoustic parameters are shown in Table 1. To solve this boundary-value problem, the grid-characteristic numerical method [22, 23] and Rusanov scheme [33] were used. We used the criterion for the principal stress and the model of cracks [29–32] to calculate the damaged areas. The critical principal stress was taken equal to 1 MPa. Wave Logica method was used [34] to analyze the calculated wave fields. The time step was taken equal to 63 µs in accordance with the stability conditions, 3,000 time steps were performed in each calculation. The coordinate step was taken equal to 0.25 m in the iceberg and from 0.5 to 0.25 m in water. A water area with a width of 380 m and a depth of 186 m was considered. The amplitude of the density of external forces was taken equal to 0.9 MPa for a plane wave and 900 MPa for a point source.
3 Influence of the Surrounding Water This section explores the effect of the surrounding water on the nature of iceberg damage. This study was carried out due to the fact that explosions of ice cubes extracted from water are often tested. Numerical experiments can show whether such practical studies are advisable, or whether practical experiments in the ice basin are necessary. Also, the calculation of wave phenomena in the surrounding water requires computational resources, which could be avoided if studies show that the presence of water does not significantly affect the nature of the damage. Figures 2, 3, 4, and 5 show wave patterns of the velocity module at a time moment of 18.9 ms and the final destruction in the presence and absence of surrounding water for a different number of periods N. It can be seen that the water has a significant effect on the nature of the destruction, due to the fact that, if it exists, the waves propagate in the water rather than being
a
b
c
d
Fig. 2 Obtained results: a wave pattern, plane wave, 100 Hz, 5 periods; b final destruction, plane wave, 100 Hz, 5 periods; c wave pattern, plane wave, without surrounding water, 100 Hz, 5 periods; d final destruction, plane wave, without surrounding water, 100 Hz, 5 periods
Icebergs Explosions for Prevention of Offshore Collision…
a
b
c
205
d
Fig. 3 Obtained results: a wave pattern, point source, 100 Hz, 5 periods; b final destruction, point source, 100 Hz, 5 periods; c wave pattern, without surrounding water, point source, 100 Hz, 5 periods; d final destruction, without surrounding water, point source, 100 Hz, 5 periods
a
b
c
d
Fig. 4 Obtained results: a wave pattern, plane wave, 100 Hz, 2 periods; b final destruction, plane wave, 100 Hz, 2 periods; c wave pattern, without surrounding water, plane wave, 100 Hz, 1 period; d final destruction, without surrounding water, plane wave, 100 Hz, 1 period
a
b
c
d
Fig. 5 Obtained results: a wave pattern, point source, 100 Hz, 1 period; b final destruction, point source, 100 Hz, 1 period; c wave pattern, surrounding water, point source, 100 Hz, 1 period; d final destruction, without surrounding water, point source, 100 Hz, 1 period
completely reflected from the free boundary of the iceberg. The most significant difference in the nature of the destruction is observed in the case of a plane wave. For example, with 1 period and 100 Hz, in the case of a plane wave in the presence of water, the destruction of the iceberg does not occur.
4 Varying Explosion Parameters In the considered mechanical and mathematical models, the problem is scaled up, so the following parameters were varied: the frequency of the source and the number of periods. Examples of wave patterns of the velocity module at a time moment of 18.9 ms and the final damage are presented in Figs. 6, 7, 8, 9, and 10.
206
A. Favorskaya and N. Khokhlov
a
b
c
d
Fig. 6 Obtained results: a wave pattern, plane wave, 40 Hz, 2 periods; b final destruction, plane wave, 40 Hz, 2 periods; c wave pattern, plane wave, 30 Hz, 2 periods; d final destruction, plane wave, 30 Hz, 2 periods
a
b
c
d
Fig. 7 Obtained results: a wave pattern, point source, 40 Hz, 2 periods; b final destruction, point source, 40 Hz, 2 periods; c wave pattern, point source, 30 Hz, 1 period; d final destruction, point source, 30 Hz, 1 period
a
b
c
d
Fig. 8 Obtained results: a wave pattern, plane wave, 197 Hz, 6 periods; b final destruction, plane wave, 197 Hz, 6 periods; c wave pattern, plane wave, 197 Hz, 12 periods; d final destruction, plane wave, 30 Hz, 2 periods
a
b
c
d
Fig. 9 Obtained results: a wave pattern, point source, 197 Hz, 6 periods; b final destruction, point source, 197 Hz, 6 periods; c wave pattern, point source, 197 Hz, 12 periods; d final destruction, point source, 197 Hz, 12 periods
The result of the analysis of the calculation results is given in Sect. 6. All the calculations presented in the paper were analyzed, and not just from this section. Note that in the case of a plane wave at a frequency of 30 Hz and 1 period of destruction of the iceberg are absent.
Icebergs Explosions for Prevention of Offshore Collision…
a
b
c
207
d
Fig. 10 Obtained results: a wave pattern, plane wave, 98.5 Hz, 6 periods; b final destruction, plane wave, 98.5 Hz, 6 periods; c wave pattern, point source, 98.5 Hz, 6 periods; d final destruction, point source, 98.5 Hz, 6 periods
5 Influence of the Geometry of the Iceberg The problem is scaled up, therefore, the width of the iceberg was varied, the height of the icebergs for all the models considered was taken to be 40 m. The calculation results for the iceberg with a width of 80 m are shown in Figs. 2 and 3. Below are the calculation results for icebergs with a width of 70 and 15 m in Figs. 11 and 12. The wave patterns of the velocity module are presented at a time moment of 18.9 ms. It can be seen that in the case of a sufficiently large transverse size of the iceberg, a variation in this size does not have a significant effect. While the transverse dimension is small enough compared to the wavelength, the waves reflected from the vertical walls of the iceberg play a significant role in the destruction and the nature of the destruction is fundamentally different.
a
b
c
d
Fig. 11 Obtained results: a width 70 m, wave pattern, plane wave, 100 Hz, 5 periods; b width 70 m, final destruction, plane wave, 100 Hz, 5 periods; c width 15 m, wave pattern, plane wave, 100 Hz, 5 periods; d width 15 m, final destruction, plane wave, 100 Hz, 5 periods
a
b
c
d
Fig. 12 Obtained results: a width 70 m, wave pattern, point source, 100 Hz, 5 periods; b width 70 m, final destruction, point source, 100 Hz, 5 periods; c 15 m wide, wave pattern, point source, 100 Hz, 5 periods; d 15 m wide, final destruction, point source, 100 Hz, 5 periods
208
A. Favorskaya and N. Khokhlov
6 Conclusions Using the grid-characteristic method, the destruction of icebergs under intense impact was studied. The aim of the work was to determine the optimal parameters of impacts, in which the iceberg does not split into large fragments, but small fragments that are not dangerous for offshore facilities. Based on the results of the research, the following conclusions can be drawn. Orientation to the wavelength and geometric dimensions of the iceberg does not give the expected result due to the complex system of re-reflections of wave phenomena inside the iceberg. Determining the impact parameters on an iceberg requires high-precision numerical simulation. Accounting for the surrounding water is required, both in numerical and in physical experiments. The role is played not only by the frequency of the impact, but also by its duration (number of periods). There is an upper critical explosion duration depending on the frequency, beyond which it does not make sense to increase it. There is also a lower critical duration of the explosion below which failure does not occur. For an iceberg with sufficiently large transverse dimensions, the optimal result was obtained when exposed with a wavelength close to the thickness of the iceberg and 5 periods. At the same time, increasing the frequency by exactly 2 times gives the opposite result. A reduction in the duration of explosion to a smaller number of periods also gives large fragments. The calculations also showed that for the majority of the studied cases, the most optimal is the use of intense point impact in comparison with the generation of a plane wave by a series of explosions. When using a series of explosions and a sufficient iceberg width, the horizontal ice plates corresponding to the wavelength arise. With a point action of a vertical direction, a transverse split and a series of radial cracks occur. The results obtained demonstrate the possibility and expediency of applying the grid-characteristic method for calculating the destruction of icebergs under intense dynamical impacts. Acknowledgements This work has been performed at Moscow Institute of Physics and Technology with the financial support of the Russian Science Foundation, grant no. 19-11-00023. This work has been carried out using computing resources of the federal collective usage center Complex for Simulation and Data Processing for Mega-science Facilities at NRC “Kurchatov Institute”, http:// ckp.nrcki.ru/.
References 1. Andrews, J.T.: Icebergs and iceberg rafted detritus (IRD) in the North Atlantic: facts and assumptions. Oceanogr. Washington DC Oceanogr. Soc. 13(3), 100–108 (2000) 2. Kucheiko, A.A., Ivanov, A.Y., Davydov, A.A., Antonyuk, A.Y.: Iceberg drifting and distribution in the Vilkitsky Strait studied by detailed satellite radar and optical images. Izvestiya, Atmos. Ocean. Phys. 52(9), 1031–1040 (2016)
Icebergs Explosions for Prevention of Offshore Collision…
209
3. Yu, H., Rignot, E., Morlighem, M., Seroussi, H.: Iceberg calving of Thwaites Glacier, West Antarctica: full-Stokes modeling combined with linear elastic fracture mechanics. The Cryosphere 11(3), 1283–1296 (2017) 4. Shi, C., Hu, Z., Ringsberg, J., Luo, Y.: A nonlinear viscoelastic iceberg material model and its numerical validation. In: Proceedings of the Institution of Mechanical Engineers, Part M. J. Eng. Marit. Environ. 231(2), 675–689 (2017) 5. Han, D., Lee, H., Choung, J., Kim, H., Daley, C.: Cone ice crushing tests and simulations associated with various yield and fracture criteria. Ships Offshore Struct. 12(sup1), S88–S99 (2017) 6. Uenishi, K., Hasegawa, T., Yoshida, T., Sakauguchi, S., Suzuki, K.: Dynamic fracture and fragmentation of ice materials. In: Gdoutos, E.E. (ed.) International Conference on Theoretical, Applied and Experimental Mechanics, pp. 242–243. Springer, Cham (2018) 7. Miryaha, V.A., Sannikov, A.V., Biryukov, V.A., Petrov, I.B.: Discontinuous Galerkin method for ice strength investigation. Math. Models Comput. Simul. 10(5), 609–615 (2018) 8. Krug, J.W.O.G.J., Weiss, J., Gagliardini, O., Durand, G: Combining damage and fracture mechanics to model calving. The Cryosphere 8(6), 2101–2117 (2014) 9. Karulina, M., Marchenko, A., Karulin, E., Sodhi, D., Sakharov, A., Chistyakov, P.: Full-scale flexural strength of sea ice and freshwater ice in Spitsbergen Fjords and North-West Barents Sea. Appl. Ocean Res. 90, 101853.1–101853.12 (2019) 10. Schwarz, J., Frederking, R., Gavrilo, V., Petrov, I.G., Hirayama, K.I., Mellor, M., Tryde, P., Vaudry, K.D.: Standardized testing methods for measuring mechanical properties of sea ice. Cold Reg. Sci. Technol. 4(3), 245–253 (1981) 11. Marchenko, A., Karulin, E., Chistyakov, P., Sodhi, S., Karulina, M., Sakharov, A.: Three dimensional fracture effects in tests with cantilever and fixed ends beams. In: 22th IAHR Symposium on Ice, pp. 1178.1–1178.8 (2014) 12. Jones, S.J., Gagnon, R.E., Derradji, A., Bugden, A.: Compressive strength of iceberg ice. Canadian J. Physics 81(1–2), 191–200 (2003) 13. Jones, S.J.: A review of the strength of iceberg and other freshwater ice and the effect of temperature. Cold Reg. Sci. Technol. 47(3), 256–262 (2007) 14. Barrette, P.D., Jordaan, I.J.: Pressure–temperature effects on the compressive behavior of laboratory-grown and iceberg ice. Cold Reg. Sci. Technol. 36(1–3), 25–36 (2003) 15. Marchenko, A., Eik, K.: Iceberg towing in open water: mathematical modeling and analysis of model tests. Cold Reg. Sci. Technol. 73, 12–31 (2012) 16. Vetter, C., Ulrich, C., Rung, T.: Analysis of towing-gear concepts using iceberg towing simulations. In: 30th International Conference on Ocean, Offshore and Arctic Engineering, Rotterdam, pp. 49355.1–49355.10 (2011) 17. Hamilton, J.M.: Vibration-based technique for measuring the elastic properties of ropes and the added masses of submerged objects. J. Atmos. Ocean. Technol. 17, 688–697 (2000) 18. Savage, S.B., Crocker, G.B., Sayed, M., Carrieres, T.: Size distributions of small ice pieces calved from icebergs. Cold Reg. Sci. Technol. 31(2), 163–172 (2000) 19. Tate, G.L.: The role of liquid oxygen explosive in iceberg utilization and development. Desalination 29(1–2), 167–172 (1979) 20. Lichorobiec, S., Barcova, K., Dorazil, T., Skacel, R., Riha, L., Cervenka, M.: The development of special sequentially-timed charges for breaking frozen waterways. Trans. VSB Tech. Univ Ostrava Saf. Eng. Ser. 11(1), 32–41 (2016) 21. Magomedov, K.M., Kholodov, A.S.: Setochno-harakteristicheskie chislennye metody. Nauka Publ, Moscow (1988). (in Russian) 22. Favorskaya, A.V., Zhdanov, M.S., Khokhlov, N.I., Petrov, I.B.: Modelling the wave phenomena in acoustic and elastic media with sharp variations of physical properties using the gridcharacteristic method. Geophys. Prospect. 66(8), 1485–1502 (2018) 23. Favorskaya, A.V., Petrov, I.B.: Grid-characteristic method. In: Favorskaya, A.V., Petrov, I.B. (eds.) Innovations in Wave Modeling and Decision Making, SIST, vol. 90, pp. 117–160. Springer, Cham (2018)
210
A. Favorskaya and N. Khokhlov
24. Stognii, P.V., Khokhlov, N.I.: 2D seismic prospecting of gas pockets. In: Petrov, I., Favorskaya, A., Favorskaya, M., Simakov, S., Jain, L. (eds.) Smart Modeling for Engineering Systems. GCM50 2018, SIST, vol. 133, pp. 156–166. Springer, Cham (2019) 25. Favorskaya, A.V., Petrov, I.B.: The use of full-wave numerical simulation for the investigation of fractured zones. Math. Models Comput. Simul. 11(4), 518–530 (2019) 26. Golubev, V.I.: The usage of grid-characteristic method in seismic migration problems. In: Petrov, I., Favorskaya, A., Favorskaya, M., Simakov, S., Jain, L. (eds.) Smart Modeling for Engineering Systems. GCM50 2018, SIST, vol. 133, pp. 143–155. Springer, Cham (2019) 27. Favorskaya, A., Petrov, I., Khokhlov, N.: Numerical modeling of wave processes during shelf seismic exploration. Proc. Comput. Sci. 96, 920–929 (2016) 28. Petrov, D.I.: Application of grid-characteristic method to some seismic exploration problems in the Arctic. J. Phys. Conf. Ser. 955(1), No 012029 (2018) 29. Favorskaya, A.V., Breus, A.V., Galitskii, B.V.: Application of the grid-characteristic method to the seismic isolation model. In: Petrov, I., Favorskaya, A., Favorskaya, M., Simakov, S., Jain, L. (eds.) Smart Modeling for Engineering Systems. GCM50 2018, SIST, vol. 133, pp. 167–181. Springer, Cham (2019) 30. Beklemysheva, K.A., Vasyukov, A.V., Golubev, V.I., Zhuravlev, Y.I.: On the estimation of seismic resistance of modern composite oil pipeline elements. Doklady Math. 97(2), 184–187 (2018) 31. Breus, A., Favorskaya, A., Golubev, V., Kozhemyachenko, A., Petrov, I.: Investigation of seismic stability of high-rising buildings using grid-characteristic method. Proc. Comput. Sci. 154, 305–310 (2019) 32. Favorskaya, A., Khokhlov, N.: Modeling the impact of wheelsets with flat spots on a railway track. Proc. Comput. Sci. 126, 1100–1109 (2018) 33. Magomedov, K.M., Kholodov, A.S.: The construction of difference schemes for hyperbolic equations based on characteristic relations. USSR Comput. Math. Math. Phys. 9(2), 158–176 (1969) 34. Favorskaya, A.: A novel method for wave phenomena investigation. Proc. Comput. Sci. 159, 1208–1215 (2019)
Genetic Operators Impact on Genetic Algorithms Based Variable Selection Marco Vannucci, Valentina Colla, and Silvia Cateni
Abstract This paper faces the problem of variables selection through the use of a genetic algorithm based metaheuristic approach. The method is based on the evolution of a population of variables subsets, which is led by the genetic operators determining their selection and improvement through the algorithm generations. The impact of different genetic operators expressly designed for this purpose is assessed through a test campaign. The results show that the use of specific operators can lead to remarkable improvements in terms of selection quality.
1 Introduction Many Machine Learning (ML) applications require model tuning in order to efficiently correlate a set of input variables to some target variables coming from a training dataset. Often the list of candidate input variables is wide and only a subset of them is actually related to the target. Several literature works highlight that the selection of both excessive and insufficient number of input variables have a detrimental effect on model performance due, in the first case, to the inclusion of noise sources into the model and, in the second case, to the exclusion of variable with a non-null informative content [6, 10]. The process of selecting the most suitable subset of relevant variables among a wider set of candidate model inputs for improving model performance is known as variable selection (VS). VS is a fundamental preprocessing step for most machine learning tasks including function approximation, classification and clustering [13, 14]. The recent growth of data sources, that M. Vannucci (B) · V. Colla · S. Cateni Scuola Superiore Sant’Anna, Istituto TeCIP, Pisa, Italy e-mail: [email protected] V. Colla e-mail: [email protected] S. Cateni e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_18
211
212
M. Vannucci et al.
are nowadays exploited in IoT and Industry 4.0 applications make VS the object of numerous works. In [3] the authors proposed a novel VS approach based on the use of Genetic Algorithms (GAs). The method, called GiveAGap, has been used in numerous applications leading to the achievement of very satisfactory results in comparison with other approaches [19, 20]. The present paper is focused on the improvement of the method through the analysis of the influence of different genetic operators on GiveAGap performance. These operators mostly define the behaviour of the GA engine that performs the search among the possible input variables sets during the GiveAGap procedure. The paper is organized as follows: Sect. 2 provides an overview of VS approaches; Sect. 3 presents GiveAGap and its standard genetic operators; Sect. 4 describes in detail the new genetic operators that were designed for the method enhancement, while their impact is assessed on a set of representative tasks Sect. 5. Finally in Sect. 6 results and future perspectives of this approach are discussed.
2 Variable Selection Approaches In the context of a learning task to be pursued by an arbitrary self-learning model by exploiting a training dataset formed by a possibly large set of variables, the aim of VS is to highlight those variables within the dataset that are actually related to the handled phenomenon (i.e. the output variable). The objectives of VS are mostly data dimensionality reduction, increased model accuracy and knowledge on the phenomenon [1, 2, 16]. In literature there is a multitude of methods for VS which can be grouped into three main categories: Filters perform VS based on a measure of the strength of the relationship among input variables and targets, which does not depend on the adopted type of model and learning algorithm. Often the mentioned index derives from statistical tests such as linear correlation [12], chi-square [21] or information gain [15]. The adopted index is used to rank the variables in order to select the top ranked ones, as mostly related to the target. These approaches show low computational complexity allowing their universal use. Their main drawback is that they are associated only with the adopted measures and not to the used ML model. Wrappers try to overcome the drawbacks of filter approaches by assessing the suitability of a variables subset through the performance of a model. They use the model as a black box and evaluate the training variables based on the accuracy of the trained model. The main advantage of this approach is that it considers both the interaction among variables and the bias of the model itself to provide a reliable assessment of each tested variables combination. The main disadvantage consists in the computational cost of the method that requires, for each evaluated subset, the training of the model. Several commonly used VS methods belong to this class. The Exhaustive Search (ES) is the simplest wrapper that tests all the possible variables subset. Although reliable, this method is often not usable since it requires the assessment of
Genetic Operators Impact on Genetic Algorithms Based Variable Selection
213
an exponential number of combinations. The Greedy Search is based on the iterative formation of the variables set in order to improve step by step the model accuracy. In this context, the Sequential Forward Selection (SFS) and Sequential Backward Selection (SBS) operate, respectively, by adding or removing variables from a temporary set. Metaheuristic methods belong to this class too: the idea of these approaches is to guide the search of the optimal variables subset by using some performance index of the previously tested subsets in order to reduce the number of tested combinations (and computational burden) without compromising the efficiency of the search. The GiveAGap method described in Sect. 3 belongs to this category. Embedded methods include models and learning algorithms in the selection process like wrappers but, rather than repeatedly using the model as a black box, they integrate the VS in the learning procedure. During the training, the input variables set is iteratively updated together with the model parameters until the convergence is reached [11]. Although this approach is efficient, its main drawback concerns the limited number of model types for which it is applicable.
3 Genetic Algorithms Based Variable Selection The GiveAGap method [3], which is the object of this work, belongs to the wrappers family and, more in detail, to the metaheuristic approaches. As a metaheuristic it aims at an efficient exploration of the domain of all the possible variables combinations by overcoming the computational problems encountered by blind search approaches. In fact, in an arbitrary context of a training dataset D formed by n potential input variables the number of combinations to evaluate by exhaustive search is 2n that, in most cases, is clearly unaffordable. GiveAGap is based on GAs guiding the algorithm search throughout the most promising regions (i.e. variables combinations) of the domain, drastically reducing the number of tested sets. GAs are a well known optimization method that combines, during the search of the optimal solution, the exploration of the search space and the exploitation of the knowledge gained during the search process. This synergy leads to a deeper analysis of the search regions that achieve a higher value for an arbitrary measure of the goodness of solutions, the so-called fitness function. GAs take inspiration from nature and evolve a population of individuals—the candidate solutions—by promoting the survival and mating of fittest individuals, generating offspring that mix the positive characteristics of existing solutions, mutating existing solutions. The main elements of GAs are the fitness function assessing the goodness of candidates and the genetic operators. – the crossover operator, that determines how the offspring is generated from individual parents – the mutation operator, that performs the mutation of existing individuals – the selection function, that determines which individuals will “survive” and reproduce according to their fitness (the higher the fitness, the higher the selection probability).
214
M. Vannucci et al.
Given a dataset D including n potential input variables used for training a model aiming at the prediction of a target T , a generic solution is coded within GiveAGap as an array of n bits, each one associate to an input variable: unitary values correspond to selected variables, null values correspond to excluded ones. The cardinality p of the initial population that GiveAGap evolves is a parameter of the method. The initialization generates a population of p individuals (called chromosomes), where the value of single elements (also called genes) is 1 or 0 according to the probabilities α and 1 − α, where α is a parameter determining the average number of active variables at the first generation of the algorithm (usually α = 0.5). At each generation of the algorithm all the individuals are evaluated through a fitness function embedding the model M: for each individual c the selected variables are extracted from the original dataset D and a reduced training dataset Dˆ is used for model training. Dˆ is then split into two parts: one for the model tuning ( Dˆ T R ) and another one used for validation purposes ( Dˆ V D ). Once the model is trained, the candidate fitness is measured as its accuracy (1/V D where V D is the average discrepancy between actual and predicted target value) on the validation set. The selection operator in GiveAGap is the so-called Roulette Wheel Selection, that selects individuals for survival and reproduction with a probability of selection proportional to their fitness. The crossover operator creates a new individual from two parent solutions. Since selection grants the high fitness of parents, that in GiveAGap means the inclusion in a set of the model input variables, the standard crossover operator forms the new individual by picking each of its elements at random from the corresponding positions of the two parents with the same probability. This method favours the selection of variables strongly linked to the target. The mutation operator aims at slightly modifying a solution in order to produce a similar (and possibly better) one. The standard mutation function changes the binary value of a limited number of elements within the handled chromosome. More in detail, the number of mutated elements (genes) is calculated as max(1, r ound(0.05 · p)), i.e. 5% of the genes are changed (or 1 gene at least). The number of survivors, offspring and mutated individuals are dynamically determined within the employed GA engine, the Fuzzy Adaptive Genetic Algorithm (FAGA) described in [18], that sets such rates according to a series of indicators assessing the status of the GA search process through a fuzzy inference system. In this configuration GiveAGap has been widely used for solving different types of VS tasks including regression [20], classification [4] and clustering [5]. The method resulted particularly suitable within industrial applications, where a large number of potential input variables need to be handled, which might be linked to each other, not related to the target variable and affected by noise. In these applications, numerous works show the reliability and robustness of the method [7–9, 17, 19]. However, so far the impact of different genetic operators (i.e. selection, crossover, mutation and initialization functions) on the method performance have not been investigated. In order to fill this gap, in this work a new set of operators expressly designed to be used within GiveAGap. The operators are presented in Sect. 4 and are subsequently tested in Sect. 5 in order to compare their actual effectiveness.
Genetic Operators Impact on Genetic Algorithms Based Variable Selection
215
4 Genetic Operators for GA-based Variables Selection In this section, a novel set of genetic operators to be used within the GA engine that leads GiveAGap is described. Operators to manage the initialization, selection, crossover and mutation of GA population through the generations have been ad-hoc designed taking into account the specific task they are devoted to.
4.1 Initialization Operators The aim of the initialization function is to create the first population of candidates that is fed to the GA and evolved through generations. Initialization is fundamental as it may affect the complete search process of the GA. The standard initialization function employed by GiveAGap is described in Sect. 3. Here two improvements are proposed, which both start by creating a population through the standard function (with α = 0.5), but differ in the way they modify such population Prune Related (PR) operates on each candidate to reduce the number of initially selected variables by removing the ones that are highly correlated to each other (using the linear correlation coefficient). More in detail, when a group of selected variables is found to have a correlation index higher than a predetermined threshold τ P R , only one of them is kept and the value of the gene associated with the others is set to 0. Prune Uncorrelated (PU) tries as well to reduce the number of initially selected variables by pruning the ones that are less correlated to the target value, i.e. the ones for which the correlation index is lower than a correlation threshold τ PU . This approach aims at reducing the GA optimization work by removing non promising variables at the beginning of the optimization.
4.2 Selection Operators The selection function purpose is to determine, on the basis of their fitness, which candidate solutions survive and those that will generate the offspring in the next generation. On one hand, in order to be efficient and evolve the population toward individuals with higher fitness, the selection function should be extremely demanding in terms of fitness but, on the other hand, it should also take into consideration a large number of individuals to allow the exploration of the search space and avoid suboptimal solutions. In this work two alternatives to the standard selection function approaches are proposed.
216
M. Vannucci et al.
Top Half (TH) selects with equal probability the top-half highest rated individuals. This method, widely used in several GA implementations, excludes the least fitted individuals and promotes diversity by assigning the same probability of selection to the other ones. Ranking (RK) selection sets the individual’s selection probabilities according to their ranking within the population rather than to their fitness. This approach avoids the formation of clusters of individuals (often very similar) that benefit from a very high selection probability. The selection probability Ps (x) of an arbitrary individual x at the n th position of the ranking is Ps (x) = γ √1n , where γ is a normalization factor ensuring that Σx Ps (x) = 1.
4.3 Crossover Operators The crossover function, generates a new individual s, from two parent solutions p1 and p2 . Crossover should preserve or improve the good characteristics of the parents in the offspring. Three alternative approaches are proposed Weighted (WE) crossover differs from the standard approach as it assigns the probability of picking an arbitrary gene from p1 and p2 with a probability that is proportional to their fitness. This method promotes the selection of variables belonging to the fittest individual Conflict for Correlation (CC) forms s by taking the genes from the two parents. When picking a gene, in case of conflict between the parents, the gene is set to 1 if the correlation between the target and the associated variable is higher than the average correlation of the currently selected variables. This approach tries to limit the number of selected variables by selecting in the offspring only those that bring an actual advantage Conflict for Reduction (CR) works similarly to CC but differs in the way it manages the conflicts: in case of conflict, the gene value is set to 0, in order to limit the number of selected variables.
4.4 Mutation Operators The role of the mutation operator is to allow the exploration of new solutions in order to avoid local minima. In the VS context, mutation should allow the inclusion of neglected combinations of variables in the GA candidates population. The standard mutation function of GiveAGap changes the value of the 5% of the mutated solution. In this work the tested variants are Correlated gene mutation (CO) operates on the selection of the genes to be mutated according to the correlation c that relates the associated variable to the target. At the individual level for each 0-valued gene contribution in terms of
Genetic Operators Impact on Genetic Algorithms Based Variable Selection
217
correlation is considered as the value c while for each 1-valued genes the loss is considered as 1 − c. The genes are selected so as to maximize the sum of these figures. The effect of this operation is the focused elimination or inclusion of variables according to their relation with the target. Proportional Mutation (PM) does not operate on the genes selection but on the number of genes to be mutated. More in detail, the lowest the individual ranking the higher the number of mutated genes. The percentage of mutated genes varies in the range 5–15%. Once the number of genes to mutate is determined, they are randomly selected.
5 Assessment of Genetic Operators on Variable Selection In this section the impact of the genetic operators presented in Sect. 4 within GiveAGap are assessed by measuring the performance of the method on a set of VS problems designed to clearly assess the effectiveness of the selection process. In line with the original tests presented by the authors in [3], three synthetic datasets have been created, D1, D2 and D3, which are formed by 10, 20 and 50 variables, respectively, and 10000 observations. The values of the variables are determined according to different distributions and orders of magnitude. These datasets constitute the input variables for three regression problems aiming to the estimation of three target variables calculated according to the following functions: y = sin(2x2 ) + x42 x8 y = e x7 +x10 · 3x6 + x16 + x15 x7 +x13 √ y = x1 + x2 · e x8 −x11 + 3x42
(1) (2) (3)
where Eqs. 1, 2, 3 are used to calculate the target variable for the datasets D1, D2 and D3, respectively, and where xi represents the ith variable of the corresponding datasets. The datasets are designed in order to include a number of potential input variables and highly non-linear relation between inputs and targets. In addition the target has been perturbed with a random noise with Gaussian distribution in different proportions (2, 5 and 10%) with respect to the average original target. The GiveAGap method has been applied for VS on the described datasets, each one associate with the corresponding target variable for the three noise levels. All the 108 possible combinations of genetic operators described in Sect. 4 including the standard operators (ST) were tested. For the PR and PU initialization operators, the threshold parameters have been set to τ P R = 0.9 and τ PU = 0.1, respectively. All tests have been run 100 times and average results are reported. The model employed in the fitness function exploits a two-layers Feed-Forward Neural Network (FFNN). The choice of the FFNN was guided by the necessity of using a model general and robust enough to be able to approximate the handled non-linear functions.
218
M. Vannucci et al.
The number of neurons in the network’s hidden layer is automatically set according to the number of network free parameters. Tables 1, 2 and 3 report the results in terms of percent selection of the correct variables. For each dataset, the results for the three adopted noise level are presented. Each table cell reports a tuple formed by three values that represent the percentage of the selection of the exact set (=), an over-dimensioned set (+) and an underdimensioned set of variables. For the sake of clarity and for space constraints, the tables report only the most representative results together with those achieved by the use of standard operators (labelled as ST in the corresponding columns). From the results, it emerges that the use of alternative genetic operators improves the overall VS performance by increasing the rate of identification of the exact variable set that happens for the three tested datasets independently on the phenomenon complexity and noise levels. This result is particularly valid since it includes the characteristics of many real-world and industrial datasets. Moreover, globally the
Table 1 Results achieved by the assessed genetic operators sets on the D1 dataset Init Sel Crs Mut D1 Ns: 2% D1 Ns: 5% D1 Ns: 10% ST ST ST ST PR PR PR PU PU PU
ST TH TH RK TH RK RK ST TH TH
ST WE WE CR CR WE CR CR ST WE
ST ST CO PM CO ST ST ST ST ST
(3,97,0) (12,88,0) (12,88,0) (9,90,1) (16,84,0) (11,89,0) (14,85,1) (34,66,0) (29,71,0) (25,75,0)
(2,95,3) (10,90,0) (11,88,1) (8,90,2) (14,86,0) (12,87,1) (12,86,2) (26,73,1) (19,77,4) (21,79,0)
(0,94,6) (2,96,2) (1,95,4) (1,91,8) (11,84,5) (12,75,13) (11,83,6) (7,90,3) (9,90,1) (8,90,2)
Table 2 Results achieved by the assessed genetic operators sets on the D2 dataset Init Sel Crs Mut D2 Ns: 2% D2 Ns: 5% D2 Ns: 10% ST ST ST ST PR PR PR PU PU PU
ST TH TH RK TH RK RK ST TH TH
ST WE WE CR CR WE CR CR ST WE
ST ST CO PM CO ST ST ST ST ST
(6,94,0) (16,84,0) (11,89,0) (13,87,0) (14,86,0) (14,85,1) (15,85,0) (43,57,0) (31,69,0) (32,68,0)
(8,90,2) (20,80,0) (9,91,0) (9,90,1) (12,86,2) (17,82,1) (16,84,0) (37,63,0) (34,65,1) (32,68,0)
(7,89,4) (12,85,3) (6,92,2) (10,84,6) (11,87,2) (16,74,10) (10,86,4) (7,91,2) (11,88,1) (11,87,2)
Genetic Operators Impact on Genetic Algorithms Based Variable Selection
219
Table 3 Results achieved by the assessed genetic operators sets on the D3 dataset Init Sel Crs Mut D3 Ns: 2% D3 Ns: 5% D3 Ns: 10% ST ST ST ST PR PR PR PU PU PU
ST TH TH RK TH RK RK ST TH TH
ST WE WE CR CR WE CR CR ST WE
ST ST CO PM CO ST ST ST ST ST
(4,96,0) (10,90,0) (11,89,0) (9,91,0) (11,88,1) (10,89,1) (12,87,1) (30,69,1) (27,72,1) (23,77,0)
(2,93,5) (8,92,0) (7,92,1) (8,88,4) (12,85,3) (10,86,4) (11,87,2) (28,70,2) (18,80,2) (19,81,0)
(1,93,6) (1,97,2) (2,96,2) (1,88,11) (14,82,4) (10,80,10) (12,82,6) (11,87,2) (8,86,6) (9,90,1)
number of under-dimensioned sets, that correspond to the situations where some of the key-variables that are not included in the set, is sensibly reduced. The initialization function appears to significantly affect the results: the PU method, which avoids the initial creation of sets containing uncorrelated variables, achieves very good results with a high rate of exact sets and low rate of under-dimensioned sets. Analogous results are obtained by the TH selection allowing evolution only for the best half of the population. The best performing crossover functions are WE and CR. The good performance of WE is in line with the global results, which tend to reward the operators that exploit the knowledge gained through GA generations, whilst the CR operator obtains good results by keeping the evolved subsets small through the exclusion of poorly correlated variables. The alternative mutation operator does not outperform the standard one. This behaviour is likely due to the conservative nature of the standard method that, if compared to the others, slightly modifies the individuals. More marked mutations imply a loss of knowledge which seems, in this context, disadvantageous.
6 Conclusions and Future Works In this paper, an analysis of the impact of different genetic operators on the results achieved by a GA-based VS approach is presented. The new operators were expressly designed to be used in a VS context aiming at both the identification of promising variables combination and the avoidance of the selection of non-influential variables, in order to improve the predictive performance of the models that will benefit from this selection process. The proposed initialization, selection, crossover and mutation operators were tested on a set of synthetic datasets with the different number of variables, levels of additive noise and complexity as far as the relation among input and output variables concerns. The results, measured in terms of the effectiveness
220
M. Vannucci et al.
of the selection, shown that generally GA-based VS benefits from the use of these advanced operators and, particularly, by those that tend to prune uncorrelated variables from the evolved GA population and perform an exploitation of the knowledge acquired through the GA optimization. In the light of this outcome, in the future new operators aiming to favour the evolution of the individuals containing most related variables will be designed and will include the use of more and different correlation measures. In addition the method will be tested on a number of real-world datasets with different characteristics in order to confirm the validity of the approach.
References 1. Cateni, S., Colla, V.: The importance of variable selection for neural networks-based classification in an industrial context. Smart Innov. Syst. Technol. 54, 363–370 (2016) 2. Cateni, S., Colla, V.: Variable selection for efficient design of machine learning-based models: efficient approaches for industrial applications. Commun. Comput. Inf. Sci. 629, 352–366 (2016) 3. Cateni, S., Colla, V., Vannucci, M.: General purpose input variables extraction: a genetic algorithm based procedure give a gap. In: 2009 9th International Conference on Intelligent Systems Design and Applications, pp. 1278–1283. IEEE (2009) 4. Cateni, S., Colla, V., Vannucci, M.: Variable selection through genetic algorithms for classification purpose. In: Proceedings of the 10th IASTED International Conference on Artificial Intelligence and Applications, AIA 2010, pp. 6–11 (2010) 5. Cateni, S., Colla, V., Vannucci, M.: A genetic algorithm-based approach for selecting input variables and setting relevant network parameters of a som-based classifier. Int. J. Simul. Syst. Sci. Technol. 12(2), 30–37 (2011) 6. Cateni, S., Colla, V., Vannucci, M.: A hybrid feature selection method for classification purposes. In: Proceedings—UKSim-AMSS 8th European Modelling Symposium on Computer Modelling and Simulation, EMS 2014, pp. 39–44 (2014) 7. Colla, V., Matino, I., Dettori, S., Cateni, S., Matino, R.: Reservoir computing approaches applied to energy management in industry, Communications in Computer and Information Science, pp. 66–69, vol. 1000. Springer (2019) 8. Colla, V., Vannucci, M., Bacchi, L., Valentini, R.: Neural networks-based prediction of hardenability of high performance carburizing steels for automotive applications. La Metallurgia Italiana 112(1), 47–53 (2020) 9. Dimatteo, A., Vannucci, M., Colla, V.: Prediction of hot deformation resistance during processing of microalloyed steels in plate rolling process. Int. J. Adv. Manuf. Technol. 66(9–12), 1511–1521 (2013) 10. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003) 11. He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems, pp. 507–514 (2006) 12. Latorre Carmona, P., Sotoca, J.M., Pla, F.: Filter-type variable selection based on information measures for regression tasks. Entropy 14(2), 323–343 (2012) 13. Mitchell, T.J., Beauchamp, J.J.: Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 83(404), 1023–1032 (1988) 14. Raftery, A.E., Dean, N.: Variable selection for model-based clustering. J. Am. Stat. Assoc. 101(473), 168–178 (2006) 15. Roobaert, D., Karakoulas, G., Chawla, N.V.: Information gain, correlation and support vector machines. In: Feature extraction, pp. 463–470. Springer (2006)
Genetic Operators Impact on Genetic Algorithms Based Variable Selection
221
16. Sebban, M., Nock, R.: A hybrid filter/wrapper approach of feature selection using information theory. Pattern Recog. 35(4), 835–846 (2002) 17. Sgarbi, M., Colla, V., Cateni, S., Higson, S.: Pre-processing of data coming from a laser-emat system for non-destructive testing of steel slabs. ISA Trans. 51(1), 181–188 (2012) 18. Vannucci, M., Colla, V.: Fuzzy adaptation of crossover and mutation rates in genetic algorithms based on population performance. J. Intell. Fuzzy Syst. 28(4), 1805–1818 (2015) 19. Vannucci, M., Colla, V.: Quality improvement through the preventive detection of potentially defective products in the automotive industry by means of advanced artificial intelligence techniques. In: Intelligent Decision Technologies 2019, pp. 3–12. Smart Innovation, Systems and Technologies, Springer (2019) 20. Vannucci, M., Colla, V., Dettori, S.: Fuzzy adaptive genetic algorithm for improving the solution of industrial optimization problems. IFAC-PapersOnLine 49(12), 1128–1133 (2016) 21. Wu, S., Flach, P.A.: Feature selection with labelled and unlabelled data. ECML/PKDD 2, 156–167 (2002)
Symmetry Indices as a Key to Finding Matrices of Cyclic Structure for Noise-Immune Coding Alexander Sergeev , Mikhail Sergeev , Nikolaj Balonin , and Anton Vostrikov
Abstract The paper discusses methods for assessing the symmetries of Hadamard matrices and special quasi-orthogonal matrices of circulant and two circulant structures used as the basis for searching for noise-resistant codes. Such codes, obtained from matrix rows intended for use in open communications, expand the basic and general theory of signal coding and ensure that the requirements for contemporary telecommunication systems are met. Definitions of the indices of symmetry, asymmetry, and symmetry defect of special matrices are given. The connection of symmetric and antisymmetric circulant matrices with primes, compound numbers, and powers of a prime number is shown. Examples of two circulant matrices that are optimal by their determinant, as well as special circulant matrices, are given. The maximum orders of the considered matrices of symmetric structures are determined.
1 Introduction In the problem of correlated reception of signals, which is common for the modern wireless systems of digital communication and radiolocation, it is very important to choose a code for the modulation of signals in the channel. This choice should provide noise-immune data exchange when the desired signal can always be picked out on the background of natural or artificial noise [1–6]. Phase (amplitude) modulation uses sequences of 1 and −1 with the required characteristics of the autocorrelation function and the ratio between its peak and the maximum “side” lobe. Such codes are, for example, Barker code of length 11, which looks like 111-1-1-11-1-11-1, or the code IEEE802.11 used among other communication standards [7]. According to Barker’s Conjecture, codes of the same name longer than 13 lengths do not exist. Although the existence of numerical sequences of different lengths that A. Sergeev · M. Sergeev · N. Balonin · A. Vostrikov (B) Saint-Petersburg State University of Aerospace Instrumentation, 67 Bolshaya Morskaya Street, Saint-Petersburg 190000, Russian Federation e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_19
223
224
A. Sergeev et al.
have identical properties and implementation aims could make it possible to achieve better noise-immune characteristics for transmission. In [8], Borwein and Mossinghoff show that if a Barker sequence of length n > 13 exists, then either n = 3 979 201 339 721 749 133 016 171 583 224 100, or n > 4 × 1033 . This improves the lower bound on the length of a long Barker sequence by a factor of nearly 2000. They also obtain eighteen additional integers n < 1050 that cannot be ruled out as the length of a Barker sequence, and find more than 237,000 additional candidates n < 10100 [8]. Anyway, the gap from n = 13 to the nearest longer candidate is still huge. Analysis shows that the codes, mentioned above, are rows of orthogonal Hadamard matrices and determinant maximum matrices with elements 1 and −1, or their fragments. Hence, when looking for new noise-immune codes, we should focus on circulant orthogonal Hadamard, Mersenne, and Raghavarao matrices [9, 10] specified by the first row of the matrix elements. All the other rows are formed by a successive shift of a previous row to the right, placing the pushed-out element on the left. If the pushed-out element sign is inverted, we get a negacirculant matrix. However, there are some restrictions on the existence of circulant orthogonal matrices in a number of their possible orders, structural symmetry types, and ways of searching for them. For example, according to Ryser’s conjecture (so-called “Circulant Hadamard matrix conjecture”) [11], there exists only one circulant orthogonal symmetric Hadamard matrix of order 4; other symmetric matrices exist in two circulant or tetra circulant form. Numerous attempts to prove the Ryser’s conjecture have been made and some of them were successful just partially (for example, in [12], Schmidt, in [13], Euler et al.). The absence of both Ryser’s conjecture proof and its disproof led mathematicians to the idea of the fundamental properties of its content. In this paper, we make our efforts to overcome the restrictions mentioned above with the help of our scientific group results achieved during many years of research in the field of new circulant quasi-orthogonal matrices. The matrices can be implemented as a source of noise-immune codes with different properties that depend on their order and other parameters. Search for such matrices can be optimized with the help of symmetry characteristics discussed in the paper for circulant and multi-block matrices.
2 Quasi-orthogonal Matrices with Two Values of Elements Among such (−1,1) matrices of orders n = 4t, where t is a natural number, most well-known are Hadamard matrices Hn , which satisfy the condition HTn Hn = nI, where I = diag{1, 1, 1,…1}. Decrementing the order n of a normal Hadamard matrix by removing its edge (a row and a column) gives us the so-called core which can be reduced to orthogonality by changing the value of its negative element to −b (|b| < 1). Matrices of orders n = 4t − 1 obtained this way are called Mersenne matrices Mn [14]. They are also quasiorthogonal, but their elements have values 1 or −b, and they satisfy the condition
Symmetry Indices as a Key to Finding Matrices …
225
MTn Mn = ω(n)I, where ω(n) is the matrix weight. The value of the element b = t+t√t is a parameter that can be used to systematize all these matrices. It is also true to say that on the basis of a core (Mersenne matrix) we can always build a Hadamard matrix in a reverse way, replacing −b with −1 and adding the edge. In [10] you can find an algorithm for building circulant Mersenne matrices of orders equal to Mersenne numbers n = 2k − 1, where k is a natural number. In [16] it is proved that the set of Mersenne matrices is expandable on all the values of orders n = 4t − 1. Experience has proved that changing the value of the second element to –b and providing amplitude-phase modulation brings extra advantages in comparison to (−1,1) codes. For example, in [9, 10] you can find noise-immune codes obtained from the rows of Hadamard, Mersenne, and Raghavarao circulant matrices, along with the characteristics demonstrating their advantages.
3 Symmetry and Antisymmetry Indices of a Circulant Matrix Symmetric matrices are among the simplest ones, having a number of useful features. They are called “symmetric” because identical elements of such matrices are placed symmetrically with respect to a diagonal. The same is true for antisymmetric matrices in which elements placed symmetrically with respect to a diagonal have opposite signs. In order to estimate the symmetry/antisymmetry violation in a circulant matrix (or its initial block), let us introduce indices that describe this feature. Definition 1 Symmetry index of a circulant matrix is the number of an element in its 1st row followed by an element whose sign differs from the sign of the 1st-column element with the same number. The authors of [16] use another symmetry index definition. It deals only with the first row of a matrix, taking into account the fact that in a symmetric matrix the 1st row elements can be split into the initial element and two mirror-symmetrical vectors. Therefore, the index in [16] takes a twice smaller value. This difference is not significant, telling us only about an interesting feature of this index: it either equals the order of a circulant matrix when the matrix is fully symmetric, or takes values smaller than a half of the matrix order. Definition 2 Symmetry defect of a circulant matrix is the number of elements in its 1st row whose signs are different from the signs of the respective elements in its 1st column. Unlike a symmetry defect, a symmetry index does not take into account the fact that a violation of a circulant matrix symmetry can be only slight, caused by just one or several elements. Note that circulant matrices built on the rows of some initial
226
A. Sergeev et al.
circulant matrix are considered equivalent. Therefore, when measuring the symmetry indices, we often choose the equivalent matrix with the maximum symmetry index. The same is true for the antisymmetry indices. When analyzing numbers and amounts of elements in a row, we will consider mismatched signs. Thus, an antisymmetry index and an antisymmetry defect both describe a violation in an antisymmetric matrix structure. For block matrices, the indices and defects of symmetry/antisymmetry are usually associated with the characteristics of the first block, as the other blocks are arranged in symmetric/antisymmetric structures. The same is true for matrices with ordinary or double edges.
4 Examples of Symmetric Structure Finiteness Figure 1 shows two determinant-optimal two circulant matrices (D-matrices). The matrix of order 32, among other things, is a Hadamard matrix because its order is 4-fold. The symmetry of such matrices is evident. What is less evident is that there are no symmetric two circulant Hadamard matrices of orders higher than 32. The limitation on symmetry, as pointed out above, is known as Ryser’s conjecture. It is interesting that for the commonly used two circulant structures nobody has ever studied the limitations similar to Ryser’s limitation. However, it is very important when you search for such structures. Our assumption that there are no symmetric two circulant Hadamard matrices (4-fold D-matrices) of order higher than 32 was seriously checked for the first time in [16]. This expands Ryser’s conjecture on two circulant structures. For matrices of orders n = 2 (mod 4) a similar limitation is applied on symmetric matrices with a double
Fig. 1 D-optimal matrices of critical orders 22 and 32
Symmetry Indices as a Key to Finding Matrices …
227
edge, and the doubly symmetric two circulant matrix of order 22 shown in Fig. 1 is probably the last one. The next D matrix of this kind has an order 34, and it is not symmetric. The same is true for all the next such matrices of order higher than 22 (this is a border). Thus, symmetric structures of determinant-extremal matrices are limited, and nonsymmetric ones should be differentiated [16, 17] by choosing the most costefficient ones in terms of the composition of their independent elements. This assortment can be easier with the symmetry indices and defects that we propose.
5 Symmetries of Circulant Mersenne Matrices The general definition of Mersenne matrices does not take into account the fact that their order can be – – – – –
a prime Mersenne number; a composite Mersenne number; a prime number of the form n = 4t − 1; a prime number power; a composite number, including a product of two close integers, i.e., 15 = 3 × 5.
The algorithms of search for Mersenne matrices, and Hadamard matrices along with them, due to the connection between them mentioned above, lead to a paradoxical result. Antisymmetry indices can help us understand it better. In [17] a fairly general statement is formulated according to which an antisymmetric circulant Mersenne matrix corresponds to a prime number equal to its order n = 4t − 1. For example, Fig. 2 shows an antisymmetric circulant Mersenne matrix of order 7. If the matrix order 4t − 1 is a prime number power, then the Mersenne matrix consists of circulant blocks whose sizes correspond to a prime number. This rule has an exception discovered in [18], Hall. Hadamard in [19] did not yet differentiate the block features of matrices. Figure 3 shows a circulant Mersenne matrix of order 15 constructed by M. Hall. Its order is equal to the product of close prime numbers 3 and Fig. 2 Circulant matrix M7
228
A. Sergeev et al.
Fig. 3 Circulant matrix M15 constructed by M. Hall
5. Viewed closely, this matrix differs from the one shown in Fig. 1, as it is circulant but not strictly antisymmetric. In other similar cases, the Mersenne matrix will also be circulant but not antisymmetric. The circulant structure is provided by an antisymmetry defect which can be measured for the matrices of orders 15 (3 × 5) and 35 (5 × 7) found by M. Hall. In [18], Hall was interested by blocking but not by the quality of the block symmetry: he did not mention any antisymmetry defect, but found one more circulant matrix of order 63 = 7 × 9. Nine is not a prime number, and we suppose that a Mersenne matrix of order 63 does not belong to the family of the two previous matrices. It is single-blocked because 63 is a Mersenne number n = 2k –1 nested into the sequence n = 4t − 1. There is no circulant Mersenne matrix of order 99 = 9 × 11, as 9 is a composite number, and 99 is not a Mersenne number. The matrices from the main family of matrices of orders n = 2k −1 are always circulant. These orders give a name to the entire set of matrices of orders n = 4t − 1 which can also be block matrices for composite numbers n. It is not easy to notice the contradiction when a circulant Mersenne matrix on prime orders corresponds to a composite order, which even is not a product of two close prime integers. The antisymmetry index can help you notice it. The special importance of the main family can be commented as follows: The prime number 2 opposes all the other prime numbers because it is even. Hence, the powers of this prime number and the Mersenne numbers connected with them by a minimum distance differ from the powers of other prime numbers by their prominence. The fee for the existence of single-block Mersenne matrices of composite
Symmetry Indices as a Key to Finding Matrices …
229
orders n = 2k −1 is still the same: matrices with antisymmetry defects. The nonprimeness of an order manifests itself either in a spoiled structure (multiple blocks) or in a spoiled antisymmetry (lower index).
6 Symmetry Indices for Mersenne Matrices of Composite Orders During the study of symmetry indices for circulant matrices of orders equal to Mersenne numbers n = 2k – 1, the following rules were revealed and formulated. Rule 1. The symmetry index for a matrix of composite orders of the form 2k – 1 is precisely equal to the exponent k. Practical research has shown that after a circulant matrix is found, all the matrices equivalent to it by a shift of their top row always have a maximum symmetry index equal to k. Rule 2. Antisymmetry defect is proportional to symmetry defect n/2 – ak, with the parameter a being a constant depending on the order (except for some special orders which loosely stick to this rule). The found Mersenne matrices of composite orders always have a defect which can be predicted when you know the dependence a = a(n). Rule 3. The antisymmetry index for matrices corresponding to prime Mersenne numbers is equal to the order n (the antisymmetry defect is equal to zero). Similar indices can be applied to weighted matrices (in [20] such a matrix of order 22 was obtained by an optimization procedure) or to other extremal matrices of a global [20] or local determinant maximum [14].
7 Conclusions Mersenne matrices, as well as Hadamard matrices built on their base (by calculating the core and adding an edge) are matrices rigidly connected with a numerical system. For Mersenne numbers n = 2k – 1, you can observe an abnormal existence of circulant structures for orders equal to composite numbers, regardless of whether the composite number is a product of two close prime integers. Computational algorithms for building abnormal Mersenne matrices can reveal their stable characteristics, namely the symmetry index of a circulant matrix, equal to the exponent k. Matrix asymmetry index can be approximated by its linear dependence on the matrix order n and symmetry index k. This dependence is stable for all the found circulant matrices. The approximation features suggest that circulant matrices of composite orders, regardless of the exact order value, will always have a nonzero antisymmetry index.
230
A. Sergeev et al.
Acknowledgement The reported study was funded by RFBR, project number 19-29-06029.
References 1. Wang„ R.: Introduction to Orthogonal Transforms with Applications in Data Processing and Analysis, p. 504. Cambridge University Press (2010) 2. Ahmed, N., Rao, K.R.: Orthogonal Transforms for Digital Signal Processing. p. 263. Springer Berlin (1975) 3. Klemm, R. (ed.): Novel Radar Techniques and Applications, vol. 1, Real Aperture Array Radar, Imaging Radar, and Passive and Multistatic Radar, p. 951. Scitech Publishing, London (2017) 4. NagaJyothi, A., Rajeswari, K.: Raja Generation and Implementation of Barker and Nested Binary Codes. J. Electr. Electron. Eng. 8(2), 33–41 (2013) 5. Proakis, J., Salehi M.: Digital communications, P. 1170. McGraw-Hill, Singapore (2008) 6. Levanon, N., Mozeson, E.: Radar Signals. p. 411. Wiley (2004) 7. IEEE802.11 Official site – URL: www.ieee802.org 8. Borwein, P., Mossinghoff, M.: Wieferich pairs and Barker sequences, II. LMS J. Comput. Math. 17(1), 24–32 (2014) 9. Sergeev, A., Nenashev, V., Vostrikov, A., Shepeta, A., Kurtyanik, D.: Discovering and Analyzing Binary Codes Based on Monocyclic Quasi-Orthogonal Matrices. Smart Innov., Syst. Technol. 143, 113–123 (2019) 10. Sergeev, M.B., Nenashev V.A., Sergeev A.M.: Nested code sequences of Barker-MersenneRaghavarao, Informatsionno-upravliaiushchie sistemy (Information and Control Systems), 3, 71–81 (In Russian) (2019) 11. Ryser, H.J.: Combinatorial mathematics. The Carus Mathematical Monographs, no. 14. Published by The Mathematical Association of America; distributed by Wiley, New York (1963) 12. Schmidt, B.: Towards Ryser’s Conjecture. European Congress of Mathematics. Progress in Mathematics, vol 201, pp. 533–541. Birkhäuser, Basel (2001) 13. Euler, R., Gallardo, L.H., Rahavandrainy, O.: Eigenvalues of circulant matrices and aconjecture of ryser. Kragujevac J. Math.- 45(5), 751–759 (2021) 14. Balonin, N.A., Sergeev, M.B.: Quasi-orthogonal local maximum determinant matrices. Appl. Math. Sci. 9(6), 285–293 (2015) 15. Sergeev, A.M.: Generalized mersenne matrices and balonin’s conjecture. Autom. Control Comput. Sci. 48(4), 214–220 (2014) 16. Balonin, N.A., Djokovic, D.Z.: Symmetry of Two-Circulant Hadamard Matrices and Periodic Golay Pairs, Informatsionno-upravliaiushchie sistemy (Information and Control Systems), 3. 2–16 (2015) 17. Sergeev, A., Blaunstein N.: Orthogonal Matrices with Symmetrical Structures for Image Processing, Informatsionno-upravliaiushchie sistemy (Information and Control Systems), 6(91), 2–8 (2017) 18. Hall, M.: A survey of difference sets. Proc. Amer. Math. Soc. 7, 975–986 (1956) 19. Hadamard, J.: Resolution d’une question relative aux determinants. Bull. des Sci. Math. 17, 240–246 (1893) 20. Balonin, N.A., Sergeev M.B.: Weighted Conference Matrix Generalizing Belevich Matrix at the 22nd Order, Informatsionno-upravliaiushchie sistemy (Information and Control Systems), 5, 97–98 (2013)
Search and Modification of Code Sequences Based on Circulant Quasi-orthogonal Matrices Alexander Sergeev , Mikhail Sergeev , Vadim Nenashev , and Anton Vostrikov
Abstract In order to improve noise-immune encoding in telecommunication channels and recognition of a useful signal in significant interference conditions, there is a need for new, advanced coding sequences. This paper is devoted to the problems of searching and examining new error-correcting codes constructed on the basis of circulant quasi-orthogonal matrices and used for phase modulation of signals in the radio channel. The paper presents requirements for well-known coding sequences for lengths greater than 13 bits. The estimates of characteristics of the new coding sequences allow us to compare them with other well-known coding sequences that are widely used in practice. The advantages of the codes, obtained in this work, are discussed in the aspects of improving the correlation characteristics, their detection, and noise immunity in the radio channels of contemporary systems.
1 Introduction Finding and studying extremal quasi-orthogonal matrices is important in many information processing fields, including the formation of new competitive codes and nested code constructions used in wireless communications to modulate signals [1–3]. The quasi-orthogonal matrix theory, as it develops, and practical results in this field are focused on stimulating scientific interest in new bases built on such matrices as a way to revise the methods of noise-immune code synthesis, along with signal or image processing algorithms. Besides, the features of quasi-orthogonal matrices as a base of codes for phase (amplitude) signal modulation can lead us to revise the solutions of, firstly, location problems in terms of improving the noise immunity in multi-position systems [4–6] which use generation and processing of code-modulated A. Sergeev · M. Sergeev · V. Nenashev · A. Vostrikov (B) Saint-Petersburg State University of Aerospace Instrumentation, 67 Bolshaya Morskaya Street, Saint-Petersburg 190000, Russian Federation e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_20
231
232
A. Sergeev et al.
signals [2, 3, 7] and, secondly, problems of reliable data transfer over a radio channel in a complex electromagnetic environment. In this paper, we consider a novel approach to search for code sequences as an alternative to Barker codes, m-sequences, and other well-known codes. The approach is based on building codes from rows of circulant quasi-orthogonal matrices like Mersenne or Raghavarao matrices [2, 3, 8, 9].
2 Background Codes of maximum lengths (m-sequences, codes constructed on the basis of Legendre symbols or quadratic residues, codes formed on the basis of Jacobi symbols, and others) are widely used both in radar systems for remote sensing and in noise-resistant and interference-free high-speed communications. While having a maximum length these codes are, in fact, a matrix— a “circulant” (as it will be shown below), each line of which is a code word. In this regard, the selection of specific types of complex sounding signals is the urgent task, related to the development of distributed radars that implement multiple access technologies. Currently, a large number of studies related to the synthesis and processing of systems of complex signals for the communication technology have been fulfilled [2, 3, 6, 9]. Similar studies are also carried out in application to radar systems [4–6]. When choosing signals formed on the basis of codes of maximum length, in communication systems and in radar systems the main attention should be paid to the study of correlation properties.
3 Ways to Form Circulant Quasi-orthogonal Matrices In [10], a quasi-orthogonal matrix is defined as a square matrix A of order n with elements |aij| ≤ 1 limited in modulus, satisfying the following condition: AT A = ω(n)I, where I is an identity matrix, and ω(n) is a weight function. Apparently, quasi-orthogonal matrices are an extension of orthogonal matrices like Hadamard, Belevitch or Haar matrices, discrete cosine transform matrices, jacket matrices [11], etc. Let us introduce some restrictions in order to pick up only those quasi-orthogonal matrices which can be suitable in noise-immune code synthesis.
Search and Modification of Code Sequences Based on Circulant …
233
Restriction 1. The number of possible values for a quasi-orthogonal matrix element should not exceed two. Such matrices can be called two-level matrices. The most well-known and studied two-level quasi-orthogonal matrices are Hadamard matrices H with elements 1 or −1. Restriction 2. We will consider only structured two-level quasi-orthogonal matrices, namely symmetric (skew-symmetric) circulant ones [12]. These restrictions considerably narrow down the class of quasi-orthogonal matrices, but such matrices do exist, and more to the point, they can be found algorithmically. It follows from Ryser’s theorem [13] that when n > 1, the only circulant Hadamard matrix is a matrix of order 4 which looks as follows: ⎡
−1 ⎢ +1 H=⎢ ⎣ +1 +1
+1 −1 +1 +1
+1 +1 −1 +1
⎤ +1 +1 ⎥ ⎥ +1 ⎦ −1
However, it is shown in [14] that there exists a set of Hadamard matrices consisting of a (n − 1) × (n − 1) circulant core supplemented by the edge of the upper row and left column, both filled with 1 s. Examples of such “core-and-edge” matrices for orders 4, 8, and 12 are given below ⎡
⎤ +1 +1 +1 +1 ⎢ +1 +1 −1 −1 ⎥ ⎥ H4 = ⎢ ⎣ +1 −1 +1 +1 ⎦, +1 −1 −1 +1 ⎤ ⎡ ++++++++ ⎢+ − − − + − + +⎥ ⎥ ⎢ ⎢+ + − − − + − +⎥ ⎥ ⎢ ⎥ ⎢ ⎢+ + + − − − + −⎥ H8 = ⎢ ⎥ ⎢+ − + + − − − +⎥ ⎥ ⎢ ⎢+ + − + + − − −⎥ ⎥ ⎢ ⎣+ − + − + + − −⎦ +−−+−++−
234
A. Sergeev et al.
⎤ ++++++++++++ ⎢+ − + − + + + − − − + +⎥ ⎥ ⎢ ⎥ ⎢ ⎢+ − − + − + + + − − − +⎥ ⎥ ⎢ ⎢+ + − − + − + + + − − −⎥ ⎥ ⎢ ⎢+ − + − − + − + + + − −⎥ ⎥ ⎢ ⎢+ − − + − − + − + + + −⎥ ⎥ ⎢ =⎢ ⎥ ⎢+ − − − + − − + − + + +⎥ ⎢+ + − − − + − − + − + +⎥ ⎥ ⎢ ⎥ ⎢ ⎢+ + + − − − + − − + − +⎥ ⎥ ⎢ ⎢+ + + + − − − + − − + −⎥ ⎥ ⎢ ⎣+ − + + + − − − + − − +⎦ ++−+++−−−+−− ⎡
H12
These matrices are in one-to-one mapping with Paley–Hadamard difference sets. For all known circulant Paley–Hadamard matrices of order n = 4t, where t is a natural number, the value (n − 1) belongs to one of the sequence types associated with the following codes: 1. 4t − 1 = 2 k − 1, where k ~ 1 (m-sequences) 2. 4t − 1 = p, where p is a prime number (Mersenne codes) 3. 4t − 1 = p (p + 2), where p and p + 2 form a double prime number (codes formed on the base of Jacobi symbols). Circulant matrices of the first type can be obtained for all k. This is true if the upper row (« circulant ») specifying the circulant structure of the matrix is represented by an m-sequence, i.e., a linear shift register sequence of a maximum length with a period 2 k –1, in which all 0 s are replaced by +1 s, and all 1 s by −1 s. Portraits of such matrices are given in Fig. 1. Hereinafter, a black field on a figure corresponds to an element with value −1, and a white field corresponds to an element with value 1. Circulant matrices of the second type can be obtained for all prime numbers p = 4t − 1 by calculating quadratic residues if the upper row of the « circulant » is formed as the following sequence: 2 p−1 1 , ,..., , −1, p p pπ
where
a p
is a Legendre symbol to modulus p, calculated as a −1, if x 2 = a mod p = 1, −otherwise. p
Portraits of circulant matrices of Mersenne type formed by Legendre symbol calculation are given in Fig. 2.
Search and Modification of Code Sequences Based on Circulant …
235
Fig. 1 Portraits of circulant matrices for orders 15, 31 and 127
Circulant matrices of the third type are formed by calculating Jacobi
symbols. a is defined If p and q are different odd prime numbers, then a Jacobi symbol pq
as a product of Legendre symbols ap qa . To build circulant matrices of the third type when n − 1 = p( p + 2) = pq we will use a line which specifies a circulant of a Jacobi symbol sequence of the form as a modification
construction pq−1 0 1 2 , pq , pq , . . . , pq , where the Jacobi symbol is used every time pq when a nonzero value is obtained. To obtain a matrix row, let us replace them by + 1, and for a ∈ { p, 2 p, 3 p, . . . , (q − 1) p} by −1. As an example, Fig. 3 shows you “portraits” of some obtained circulant matrices of the third type.
236
A. Sergeev et al.
Fig. 2 Portraits of circulant matrices of Mersenne type for orders 15, 31 and 127
4 Search for Codes Based on Rows of Quasi-orthogonal Matrices The most important characteristic of a code sequence in correlation reception of coded signals is the autocorrelation function (ACF) [2, 9]. In order to obtain a code sequence with a « good » ACF, let us consider two ways of building it. The first way is based on using Mersenne matrices (M) of orders 2 k − 1, nested into a sequence n = 4t − 1 [15] and having a local determinant maximum [10]. A code sequence obtained from such a quasi-orthogonal matrix will have elements with value −1 placed in different positions. An element with value 1 will correspond to a. Likewise, an element with value −1 will correspond to −b. Let us form a unique matrix M of order 3 via quadratic residues
Search and Modification of Code Sequences Based on Circulant …
237
Fig. 3 Portraits of circulant matrices formed by calculating for orders 15 (3 × 5), 35 (5 × 7) and 143 (11 × 13)
⎡
⎤ a a −b M = ⎣ −b a a ⎦. a −b a and find a product which would look like ⎡ ⎢ M × MT = ⎣
a 2 + b2 a 2 − b2
..
a 2 − b2 . a 2 + b2
⎤ ⎥ ⎦.
238
A. Sergeev et al.
Solving the non-diagonal equation, when it equals zero, will allow us to find the value of b. For each i-th line, i = 2, 3, … we will successively calculate the ACF in order to choose the most successful code. The second way is using the Raghavarao matrix (R) as a base for building codes [8, 9]. With a maximum determinant, this matrix will satisfy the equation RT R = (n − 1) I + O where R is the desired integer matrix of 1 and −1, while I and O are, respectively, an identity matrix and a matrix consisting of all 1 s. Building such matrices is based on the fact that if the product RT R has a quasidiagonal form with elements on the diagonal d and if outside this diagonal s < d, then the determinant will be det(RT R) = det(R)2 = (d − s)n−1 (d + s(n − 1)). From the condition of determinant square positiveness, it follows that s = −1 and d = n or s > 0. The maximum determinant value is observed in the cases when s = 1 and d = n (for odd n).
5 Modified Code Sequences In order to improve the autocorrelation characteristics, let us present some examples of code modification. For a predefined m-sequence, we will use the Pailey–Hadamard matrix construction of the first type which, as noted above, can be obtained for all k–1. To complete the modification, we will need the series of steps. Step 1. Take the upper row of the « circulant » as an m-sequence with a period N = 2n − 1 and form a circulant matrix by shifting the m-sequence over the rows with a phase shift in each one. The obtained m-sequence-based circulant matrix for N = 15 is presented below ⎡
1 ⎢0 ⎢ ⎢0 ⎢ ⎢1 ⎢ ⎢ ⎢1 ⎢ ⎢0 ⎢ ⎢1 ⎢ ⎢0 ⎢ ⎢1 ⎢ ⎢1 ⎢ ⎢ ⎢1 ⎢ ⎢1 ⎢ ⎢0 ⎢ ⎣0 0
0 1 0 0 1 1 0 1 0 1 1 1 1 0 0
0 0 1 0 0 1 1 0 1 0 1 1 1 1 0
0 0 0 1 0 0 1 1 0 1 0 1 1 1 1
1 0 0 0 1 0 0 1 1 0 1 0 1 1 1
1 1 0 0 0 1 0 0 1 1 0 1 0 1 1
1 1 1 0 0 0 1 0 0 1 1 0 1 0 1
1 1 1 1 0 0 0 1 0 0 1 1 0 1 0
0 1 1 1 1 0 0 0 1 0 0 1 1 0 1
1 0 1 1 1 1 0 0 0 1 0 0 1 1 0
0 1 0 1 1 1 1 0 0 0 1 0 0 1 1
1 0 1 0 1 1 1 1 0 0 0 1 0 0 1
1 1 0 1 0 1 1 1 1 0 0 0 1 0 0
0 1 1 0 1 0 1 1 1 1 0 0 0 1 0
⎤ 0 0⎥ ⎥ 1⎥ ⎥ 1⎥ ⎥ ⎥ 0⎥ ⎥ 1⎥ ⎥ 0⎥ ⎥ 1⎥ ⎥ 1⎥ ⎥ 1⎥ ⎥ ⎥ 1⎥ ⎥ 0⎥ ⎥ 0⎥ ⎥ 0⎦ 1
Search and Modification of Code Sequences Based on Circulant …
239
Step 2. Replace all the 0 elements by −1, and all the 1 elements by +1. The resulting matrix will look like +1 −1 −1 −1 +1 +1 +1 +1 −1 +1 −1 +1 +1 −1 −1 −1 +1 −1 −1 −1 +1 +1 +1 +1 −1 +1 −1 +1 +1 −1 −1 −1 +1 −1 −1 −1 +1 +1 +1 +1 −1 +1 −1 +1 +1 +1 −1 −1 +1 −1 −1 −1 +1 +1 +1 +1 −1 +1 −1 +1 +1 +1 −1 −1 +1 −1 −1 −1 +1 +1 +1 +1 −1 +1 −1 −1 +1 +1 −1 −1 +1 −1 −1 −1 +1 +1 +1 +1 −1 +1 +1 −1 +1 +1 −1 −1 +1 −1 −1 −1 +1 +1 +1 +1 −1 −1 +1 −1 +1 +1 −1 −1 +1 −1 −1 −1 +1 +1 +1 +1 +1 +1 +1 −1 +1 +1 −1 −1 +1 −1 −1 −1 +1 +1 +1 +1 −1 +1 +1 −1 +1 +1 −1 −1 +1 −1 −1 −1 +1 +1 +1 +1 −1 −1 +1 −1 +1 +1 −1 −1 +1 −1 −1 −1 +1 +1 +1 +1 +1 −1 +1 −1 +1 +1 −1 −1 +1 −1 −1 −1 −1 +1 +1 +1 +1 −1 +1 −1 +1 +1 −1 −1 +1 −1 −1 −1 −1 +1 +1 +1 +1 −1 +1 −1 +1 +1 −1 +1 +1 −1 −1 −1 −1 +1 +1 +1 +1 −1 +1 −1 +1 +1 −1 −1 +1 Step 3. Replace all the −1 elements by indefinite −b and obtain the following matrix: +1 −b −b +1 +1 −b +1 −b +1 +1 +1 +1 −b −b −b
−b +1 −b −b +1 +1 −b +1 −b +1 +1 +1 +1 −b −b
−b −b +1 −b −b +1 +1 −b +1 −b +1 +1 +1 +1 −b
−b −b −b +1 −b −b +1 +1 −b +1 −b +1 +1 +1 +1
+1 −b −b −b +1 −b −b +1 +1 −b +1 −b +1 +1 +1
+1 +1 −b −b −b +1 −b −b +1 +1 −b +1 −b +1 +1
+1 +1 +1 −b −b −b +1 −b −b +1 +1 −b +1 −b +1
+1 +1 +1 +1 −b −b −b +1 −b −b +1 +1 −b +1 −b
−b +1 +1 +1 +1 −b −b −b +1 −b −b +1 +1 −b +1
+1 −b +1 +1 +1 +1 −b −b −b +1 −b −b +1 +1 −b
−b +1 −b +1 +1 +1 +1 −b −b −b +1 −b −b +1 +1
+1 −b +1 −b +1 +1 +1 +1 −b −b −b +1 −b −b +1
+1 +1 −b +1 −b +1 +1 +1 +1 −b −b −b +1 −b −b
−b +1 +1 −b +1 −b +1 +1 +1 +1 −b −b −b +1 −b
−b −b +1 +1 −b +1 −b +1 +1 +1 +1 −b −b −b +1
Step 4. Multiply this matrix by its transpose matrix. You will get a matrix in which the main diagonal elements are equal to 8 + 7b2 , and all the other elements are equal to 4 − 8b + 3b2 . To find −b from the resulting product, knowing that all the elements outside the main diagonal are equal to zero, you need to solve a quadratic equation,
240
A. Sergeev et al.
equating a random non-diagonal matrix element to zero, and to choose the smallest root of this equation. In our case, b = 2/3 (−b = − 2/3). Check the obtained value of −b and compare it with the initial m-sequence. The comparison of normalized ACF for usual and modified m-sequences shown in Fig. 4 demonstrates that it is the modified sequence where the maximum points closest to the main peak reach smaller values. This increases the probability of its correct detection in a noise background. Similar ACF improvements have been observed for modified sequences of the maximum length obtained from circulant Mersenne matrices (see Fig. 2) and circulant matrices based on Jacobi symbol calculation (see Fig. 3). Fig. 4 ACF for usual (a) and modified (b) m-sequences when N = 15
Search and Modification of Code Sequences Based on Circulant …
241
6 Summary We have discussed the ways to build circulant quasi-orthogonal matrices of three types. On the base of these matrices, you can apply two modes of search for new modified codes. The characteristics of the new modified code sequences are estimated and compared with the similar m-sequence-based codes commonly used in practice. For the elements of a new modified code sequence of the maximum length, you have to find pairs of a and b values. Their usage determines the result of signal-code construction marking.
7 Conclusions The result of the research demonstrates that the codes obtained on the base of circulant quasi-orthogonal matrices provide a better noise immunity for signals in radio channels and a higher probability of their correct detection under external noise. The obtained results suggest that the new modified codes can be efficiently used for amplitude, phase, and other types of radio signal modulation. These results are especially interesting in the aspect of synthesizing ultrawideband signal-code constructions modulated by the obtained modified code sequences. Of course, this will require the development of special algorithms to process such complex and hypercomplex signals, taking into account the features of radar and telemetry channels in order to provide better detection characteristics. Acknowledgements The reported study was funded by a grant of Russian Science Foundation (project No. 19-79-003) and by RFBR, project number 19-29-06029.
References 1. Wang, R.: Introduction to orthogonal transforms with applications in data processing and analysis, p. 504. Cambridge University Press (2010) 2. Nenashev, V.A., Sergeev, M.B., Kapranova, E.A.: Research and analysis of autocorrelation functions of code sequences formed on the basis of monocirculant quasi-orthogonal matrices. Informatsionno-upravliaiushchie sistemy (Inf. Control Syst.) 4, 4–19 (2018) 3. Sergeev, M.B., Nenashev, V.A., Sergeev, A.M.: Nested code sequences of Barker-Mersenne– Raghavarao, Informatsionno-upravliaiushchie sistemy (Inf. Control Syst.) 3(100), 71–81 (2019) 4. Nenashev, V.A., Kryachko, A.F., Shepeta A.P., Burylev D.A.: Features of information processing in the onboard two-position small-sized radar based on UAVs, pp. 111970X-1– 111970X-7. SPIE Future Sensing Technologies, Tokyo, Japan (2019) 5. Nenashev, V.A., Sentsov, A.A., Shepeta, A.P.: Formation of radar image the earth’s surface in the front zone review two-position systems airborne radar. In: Wave Electronics and its Application in Information and Telecommunication Systems (WECONF), pp. 1–5. Saint-Petersburg, Russia (2019)
242
A. Sergeev et al.
6. Kapranova, E. A., Nenashev V.A., Sergeev, A.M., Burylev, D.A., Nenashev, S.A.: Distributed matrix methods of compression, masking and noise-resistant image encoding in a high-speed network of information exchange, information processing and aggregation, pp. 111970T-1– 111970T-7. SPIE Future Sensing Technologies, Tokyo, Japan (2019) 7. Levanon, N., Mozeson, E.: Radar Signals, p. 411. Wiley (2004) 8. Raghavarao, D.: Some optimum weighing designs. Ann. Math. Statist. 30, 295–303 (1959) 9. Sergeev, A.M., Nenashev, V.A., Vostrikov, A.A., Shepeta, A.P., Kurtyanik, D.V.: Discovering and Analyzing Binary Codes Based on Monocyclic Quasi-Orthogonal Matrices. In: Czarnowski, I., Howlett, R., Jain, L. (eds.) Intelligent decision technologies 2019. Smart Innovation, Systems and Technologies, vol. 143, pp. 113–123. Springer, Singapore (2019) 10. Balonin, N., Sergeev, M.: Quasi-orthogonal local maximum determinant matrices. Appl. Math. Sci. 9(8), 285–293 (2015) 11. Lee, M.H.: Jacket matrices: constructions and its applications for fast cooperative wireless signal processing. LAP LAMBERT Publishing, Germany (2012) 12. Vostrikov, A., Sergeev, M., Balonin, N., Sergeev, A.: Use of symmetric Hadamard and Mersenne matrices in digital image processing. Procedia Computer Science 126, 1054–1061 (2018) 13. Ryser, H.J.: Combinatorial mathematics, The Carus mathematical monographs, vol. 14. P. 162, Wiley. Published by the Mathematical Association of America. N. Y. (1963) 14. Balonin, N.A., Sergeev, M.B.: Ryser’s conjecture expansion for bicirculant strictures and hadamard matrix resolvability by double-border bicycle ornament, Informatsionnoupravliaiushchie sistemy (Inf. Control Syst.) 1, pp. 2–10, (2017) 15. Sergeev, M.: Generalized Mersenne Matrices and Balonin’s Conjecture. Autom. Control Comput. Sci. 48(4), 214–220 (2014)
Processing of CT Lung Images as a Part of Radiomics Aleksandr Zotin , Yousif Hamad , Konstantin Simonov , Mikhail Kurako , and Anzhelika Kents
Abstract In recent years, medical technologies aimed at extracting quantitative features from medical images have been greatly developed. One of them is radiomics, which allows extracting a large number of quantitative indicators, based on various features. The extracted data are preliminarily evaluated and visualized to improve decision support by a medical specialist. Paper describes a combination of methods for assessing CT images. Processing is aimed at solving a number of diagnostic tasks, such as highlighting, contrasting objects of interest taking into account color coding, and their further evaluation by corresponding criteria aimed to clarify the nature of the changes and increase both the detectability of pathological changes and the accuracy of the diagnostic conclusion. For these purposes, it is proposed to use pre-processing algorithms that take into account a series of images. Segmentation of lungs and possible pathology area are conducted using wavelet transform and Otsu thresholding. As the means of visualization and feature extraction, it was decided to use delta maps and maps obtained by shearlet transform with color coding. The
A. Zotin Reshetnev Siberian State University of Science and Technology, 31 Krasnoyarsky Rabochy Pr, Krasnoyarsk 660037, Russian Federation e-mail: [email protected] Y. Hamad · M. Kurako (B) Siberian Federal University, 79 Svobodny St, Krasnoyarsk 660041, Russian Federation e-mail: [email protected] Y. Hamad e-mail: [email protected] K. Simonov Institute of Computational Modelling of the SB RAS, 50/44, Akademgorodok, Krasnoyarsk 660036, Russian Federation e-mail: [email protected] A. Kents Federal Siberian Scientific and Clinical Center, FMBA of Russia, 24 Kolomenskaya St, Krasnoyarsk 660037, Russian Federation e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_21
243
244
A. Zotin et al.
experimental and clinical material shows the effectiveness of the proposed combination for analyzing the variability of the internal geometric features of the object of interest in the studied images.
1 Introduction A new area of modern medicine is a “precision medicine,” the purpose of which is the personalization of treatment based on the specific characteristics of the patient and his disease [1]. Modern radiation diagnostics is the most developing field of medicine and the use of classical radiography and multispiral computed tomography is becoming a widely used diagnostic method. At the same time, the demand for more informative and high-quality radiological conclusions significantly increases [2–8]. Most of the research related to precision oncology is focused on verification of volumetric formation based on the molecular characteristics of tumors, which require tissue extraction by tumor biopsy [1]. Over the past decades, advances in the development of high-resolution imaging equipment and standardized pathology assessment protocols allowed to move toward quantitative imaging [7, 9–12]. In this regard, it should be noted the area of modern medical technology radiomics, which allows to evaluate the characteristics of the area of interest based on CT scan, MRT, or PET-CT [2]. Diagnostic imaging makes it possible to characterize tissues in a non-invasive way. Radiomics is rapidly turning into the technology of personalized medicine [2, 13, 14]. A quantitative analysis of medical image data using modern software provides more information than that of a radiologist [2]. For example, a quantitative analysis of images with lung cancer results in an increase in diagnostic accuracy of 12.4% [4]. According to medical research, tumors with greater genomic heterogeneity are more resistant to treatment and metastasis. This is due to the concept that more heterogeneous tumors have a worse prognosis. According to the radiomics hypothesis, genomic heterogeneity can be transformed into an intratumoral heterogeneity assessed by visualization and which ultimately demonstrate a worse prognosis. This hypothesis was supported by Jackson et al. [15] as well as by Diehn et al. [7, 16]. They also showed that a specific imaging pattern could predict overexpression of the Epidermal Growth Factor Receptor (EGFR). Similar results were obtained in hepatocellular carcinomas of Segal et al. [17] showing that a combination of only 28 image attributes is sufficient to restore the variation of 116 gene expression modules. The most important steps in the framework of radiomics and analysis of CT image set are the correct localization and visualization of the region of interest (pathology). Since the source data may be affected by noise, the use of advanced methods is required for processing and analysis of such images. The definition of most discriminatory methods for isolating the desired characteristics varies depending on the method of visualization and the pathology studied, therefore it is currently the subject of research in the field of radiomics [2]. Radiomics can be used to increase
Processing of CT Lung Images as a Part of Radiomics
245
the precision in the diagnosis, assessment of prognosis, and prediction of therapy response particularly in combination with clinical, biochemical, and genetic data. The paper is organized as follows. Section 2 presents a description of CT images processing technique. Section 3 provides experimental results. Concluding remarks are outlined in Sect. 4.
2 Technique of CT Images Processing Modern approaches to the processing of medical images have an impact on the conclusions drawn from the analysis of these images. In the presence of pathological changes, a descriptive picture is compiled according to a number of criteria: – – – –
Localization and ratio of pathological changes to other structures of the lung. Numerical component of the detected changes. Description of the form of pathological formation and contours. Measurement of the size of pathological changes in three planes (when conducting CT scan). – Intensity of changes characterizing tissue density due to the presence of various components in its structure. – In some controversial cases, the presence of a vascular component or necrosis in the tissues. These criteria are the basis for the analysis of images not only to characterize pathological changes but also as a method of communication between a radiologist and doctors of other specialties. Based on the above criteria, there is a sequence of image processing stages. A typical radiomics image processing scheme is shown in Fig. 1. In order to determine the mentioned above criteria, it is necessary to prepare a series of CT images for a more accurate analysis. It was decided to use the color
Fig. 1 Stages of medical image processing
246
A. Zotin et al.
Fig. 2 Scheme of the proposed methodology for image processing
coding technique based on maps obtained by shearlet transforms [18, 19]. This technique has shown good results during tissue analysis. The main stages include preliminary processing, segmentation, the formation of the contour representation with color coding, extraction of features, and analysis of the results. The scheme of the proposed image processing technique is presented in Fig. 2. The feature extraction is carried out on the basis of histogram characteristics, delta maps as well as other texture features, which allows assessing the degree of pathology of the object of interest. In the case of automated diagnostics, it is possible to train neural networks for specific tasks with the use of specific features set. Sections 2.1 and 2.2 present a description of the steps of the proposed CT images processing technique.
2.1 Pre-processing and Initial Visualization The main task of pre-processing is to improve the quality of CT images. This step allows to improve some image parameters (for example, the signal-to-noise ratio), and improve the visual representation by removing noise as well as modifying unwanted parts of the background. As a preliminary processing for noise reduction, it was decided to use a median filter with modifications for image set processing. Also, it was decided to use the weights taking into account the distance of adjacent CT images from the processed image. The implementation of the median filter based on the histograms [20] is taken as the basis. Improvement of the quality of noise
Processing of CT Lung Images as a Part of Radiomics
247
Fig. 3 Scheme of weighted median filter histogram formation
reduction is based on adaptive weight adjustment based on the local neighborhood (3 × 3) of the processed images. The scheme of the histogram formation taking into account the weight is shown in Fig. 3. Since CT images can have different brightness and contrast, it was decided to use the Multi-Scale Retinex (MSR) modification, in which wavelet transform was used to speed up the calculations [21] to correct brightness and local contrast in the area of interest. A brighter representation in the lung area allows to calculate texture features and to assess the structural characteristics of the lesion. During the calculation of the features, the data of performed correction by MSR in a form of color-coded delta maps are also used. Delta maps provide an opportunity to perform a preliminary assessment of fragments in the area of interest. An example of CT image processing is shown in Fig. 4. For better clarity brightness, delta map was multiplied by 5 and edge delta map was multiplied by 10. As can be seen from Fig. 4, a different form of visualization makes it possible to discern small details. The use of color coding contributes to better visualization and the creation of accents at certain points. The brightness delta map allows calculating the density difference between the air lungs and more dense soft tissue components of the muscles or structures that fall into the lung section (bronchi, blood vessels). The edge delta map allows conducting a preliminary assessment of the boundaries of the body of the lungs and large vessel as well as numerous small structures against the background of lung tissue.
Fig. 4 Image processing: a original image, b image after noise suppression and MSR processing, c brightness delta map, d edge delta map
248
A. Zotin et al.
2.2 Image Segmentation and Data Visualization One of the important stages of the CT images analysis with automated parameter calculations is the accurate determination of the boundaries of the lungs and possible areas of pathologies, expressed as dark/light area. As the first step of segmentation, an image transformation is performed using an analysis of the principal components (PCA). Also, the conducted local contrast correction and suppression of minor noise by adjustment of high-frequency components were obtained during wavelet transform (1 level). Subsequently, using Discrete Wavelet Transform (DWT), the segmentation similar to the proposed by Chen et al. [12] is performed. The resulting image is subjected to morphological processing. In order to increase the significance of the boundaries of the front areas and remove small holes in the CT image, a morphological closure is used. The next step is a binarization based on the Otsu’s thresholding method [22]. In a case when areas suspicious of pathology (a difference in brightness components) are detected in the lung region, they are additionally allocated as separate segments. After segmentation, color coding is performed using the shearlet transform [18, 19]. An example of segmentation and formation of maps for texture features calculation as well as contour representations with color coding is shown in Fig. 5. Taking into account the results of segmentation and the representation maps based on the shearlet transform with color coding, the next stage is a calculation of texture features necessary to obtain estimates.
Fig. 5 Example of CT image processing: a original image, b full lung segmentation, c segmented lung with nodules, d final segmentation of lung and nodules, e extracted regions, f color-coded regions, g brightness delta map, f color-coded edge map
Processing of CT Lung Images as a Part of Radiomics
249
3 Experimental Assessment The experimental assessment was carried out on the basis of fluorographic, digital X-ray images, and CT images. We used 15 image sets for patients with volume formations. Each image set contains about 400 CT images. Reliability and correctness of lung and nodules boundaries detection have evaluated by the following estimates [8]: Figure Of Merit (FOM) and Jaccard () and Dice (DSC) similarity coefficients. In order to calculate the estimates, the following set of CT images was used: 112 images with pathological changes and 97 images without pathologies. Visual representation of detected lung boundaries and possible regions with pathology is presented in Fig. 6, where the red line defines boundary obtained by the proposed method and the green line is ground truth. The ground truth regions were formed with the help of a senior radiologist.
Fig. 6 Detected regions: a CT images without pathology, b CT images with pathology
Table 1 Statistical data of quantitative estimates of lungs and nodules boundaries detection Estimates
Lung boundaries
Nodules boundaries
Min
Avg ± SD
Med
Max
Min
Avg ± SD
Med
Max
FOM
0.975
0.981 ± 0.015
0.985
0.996
0.966
0.975 ± 0.012
0.978
0.987
0.948
0.955 ± 0.019
0.963
0.974
0.943
0.958 ± 0.014
0.967
0.972
DSC
0.969
0.971 ± 0.013
0.974
0.984
0.956
0.965 ± 0.011
0.969
0.976
250
A. Zotin et al.
Fig. 7 Example of processing: a original images with marked nodules, b segmentation of lungs (green) and nodules (maroon), c color-coded regions, d color-coded edge map
Table 1 demonstrates statistical data (minimum, maximum, median, and average values with standard deviation) of quantitative estimates of detected lungs and nodules boundaries for all described indicators. During the evaluation, it was found out that noise suppression with proposed modification of median filters allows increasing the accuracy of boundaries detection by 2.8–4.4%. An example of segmentation and visual representation of the detected boundaries of the lungs and nodules is presented in Fig. 7. The figure shows the results of segmentation, color coding of the lung region, and the detected contours of objects of interest (lungs and tumors). The obtained forms of visualization as well as methods for their interpretation are used for the diagnostic evaluation of the image. The medical assessment is affected by the measurement error (most often in the statistical data) as well as the specificity characterizing the peculiarity of the pathological process. Evaluation of the data was carried out on the basis of an X-ray diagnostic conclusion of two groups of radiologist, five in each group. The division was carried out according to the level of qualification (five highly qualified (experts) with 10 or more years of experience and five with experience from 2 to 10 years). The results obtained by experts were almost equal to computer processing, with a slight deviation for a number of points, which was taken as a standard. Further clinical investigations had confirmed that patients with burdened and varicose tumors of other systems had volumetric formations. Information on the estimates obtained is given in Table 2. It should be noted that the specialist’s sensitivity does not differ very much from computer processing’s sensitivity; however, it depends on the skill of the specialist. In terms of specificity, it is necessary to take into account a larger list of criteria and variability of indicators. Moreover, a specialist cannot evaluate a number of
Processing of CT Lung Images as a Part of Radiomics Table 2 Assessment of evaluations maid by medical specialists manually and with use of proposed representations formed by software tools
251
Evaluated Features
Evaluation by radiologist (%)
Software evaluation (%)
Pathology localization
96.1 ± 3.2
97.7 ± 4.8
Pathology objects detection
92.3 ± 5.1
98.9 ± 3.4
Shape
97.2 ± 1.9
99.5 ± 2.1
0.0
99.3 ± 1.5
92.1 ± 2.0
98.5 ± 1.8
Size of pathologies’ structural elements Edge Relation to other structures
100.0
Tissue density
90.7 ± 4.9
Sensitivity
98.2 ± 0.8
0.0 98.3 ± 1.6 100.0
criteria without using the software. The proposed way of visualization accompanied by texture analysis allows to obtain a number of useful biomarkers that provide an objective and quantitative assessment of pathology (tumors) heterogeneity by analyzing differences and patterns in the set of CT Images.
4 Conclusions The development in the radiomics along with the use of modern equipment and taking into account the standardized protocol of medical assessment allow to obtain a number of images quantitative assessments. The development of a technique for preparing images for feature extraction in a semi-automatic mode also allows to receive additional information, thereby supporting the medical decision making. The proposed solutions can increase the performance of image processing, make it possible to identify smaller pathological changes and improve the quality of the radiological conclusions. Despite that, experience of the radiologist affects the quality of image processing results’ interpretation and final quality of the medical conclusions.
References 1. Abrahams, E., Silver, M.: The case for personalized medicine. J. Diabetes Sci Technol. 3(4), 680–684 (2009) 2. Lambin, P., Rios-Velazquez, E., Leijenaar, R., Carvalho, S., van Stiphout, R., Granton, P., Zegers, C., Gillies, R., Boellard, R., Dekker, A., Aerts, H.: Radiomics: Extracting more information from medical images using advanced feature analysis. Eur. J. Cancer 48(4), 441–446 (2012)
252
A. Zotin et al.
3. Parmar, C., Grossmann P., Bussink, J., Lambin, P., Aerts H.JW.L.: Machine learning methods for quantitative radiomic biomarkers. Sci. Rep. 5, 13087 (2015) 4. Choi, W., Oh, J.H., Riyahi, S., Liu, C.-J., Jiang, F., Chen, W., White, C., Rimner, A., Mechalakos, J.G., Deasy, J.O., Lu, W.: Radiomics analysis of pulmonary nodules in low-dose CT for early detection of lung cancer. Med. Phys. 45(4), 1537–1549 (2018). https://doi.org/10.1002/mp. 12820 5. Valliammal, N.: Leaf image segmentation based on the combination of wavelet transform and K means clustering. Int. J. Adv.Res. Artif. Intell. 1(3), 37–43 (2012) 6. Saba, T., Sameh, A., Khan, F., Shad, S. and Sharif, M.: Lung nodule detection based on ensemble of hand crafted and deep features. J. Med. Syst. 43(12), 332.1–332.12 (2019) 7. Thompson, G., Ireland, T., Larkin, X., Arnold, J., Holsinger, R.: A novel segmentation-based algorithm for the quantification of magnified cells. J. Cell. Bioch. 15(11), 1849–1854 (2014) 8. Zotin, A., Hamad, Y., Simonov, K., Kurako, M.: Lung boundary detection for chest X-ray images classification based on GLCM and probabilistic neural networks. Procedia Comput. Sci. 159, 1439–1448 (2018) 9. Xu, M., Qi, S., Yue, Y., Teng, Y., Xu, L., Yao, Y. and Qian, W.: Segmentation of lung parenchyma in CT images using CNN trained with the clustering algorithm generated dataset. Biomed. Eng. Online 18, 2.1–2.21 (2019) 10. Khordehchi, E. A., Ayatollahi, A., Daliri, M.: Automatic lung nodule detection based on statistical region merging and support vector machines. Image Analysis & Stereology 36(2), 65–78 (2017) 11. Nardini, P., Chen, M., Samsel, F., Bujack, R., Bottinger, M., Scheuermann, G.: The making of continuous colormaps. IEEE Trans. Vis. Comput. Graph. 14(8), 1–15 (2019) 12. Chen, S., Hung, P., Lin, M., Huang, C., Chen, C., Wang T., Lee, W.: DWT-based segmentation method for coronary arteries. J. Med. Syst. 38(6), 55.1–55.8 (2014) 13. Senthil, K.K., Venkatalakshmi, K., Karthikeyan, K.: Lung cancer detection using image segmentation by means of various evolutionary algorithms. Computational and Mathematical Methods in Medicine, 1–16 (2019) 14. Hamad, Y., Simonov, K., Naeem, M.: Detection of brain tumor in MRI images, using a combination of fuzzy C-means and thresholding. Int. J. Adv.Pervasive and Ubiquit. Comput. 11(1), 45–60 (2019) 15. Jackson, A., O’Connor, J.P.B., Parker, G.J.M., Jayson, G.C.: Imaging tumor vascular heterogeneity and angiogenesis using dynamic contrast-enhanced magnetic resonance imaging. Clin. Cancer Res. 13, 3449–3459 (2007) 16. Diehn, M., Nardini, C., Wang, D., McGovern, S., Jayaraman, M., Liang, Y., Aldape, K., Cha, S., Kuo M.: Identification of noninvasive imaging surrogates for brain tumor gene-expression modules. Proc. Natl. Acad. Sci. U S A. 105, 5213–5218 (2008) 17. Segal, E., Sirlin, C.B., Ooi, C., Adler, A.S., Gollub, J., Chen, X., Chan, B.K., Matcuk, G.R., Barry, C.T., Chang, H.Y., Kuo, M.D.: Decoding global gene expression programs in liver cancer by noninvasive imaging. Nat. Biotechnol. 25, 675–680 (2007) 18. Zotin, A., Simonov, K., Kapsargin, F., Cherepanova, T., Kruglyakov, A., Cadena, L.: Techniques for medical images processing using shearlet transform and color coding. In: Favorskaya, M.N., Jain, L.C. (eds.) Computer Vision in Control Systems-4. ISRL, vol. 136, pp. 223–259. Springer, Cham (2018) 19. Zotin, A., Simonov, K., Kapsargin, F., Cherepanova, T., Kruglyakov, A.: Tissue germination evaluation on implants based on shearlet transform and color coding. In: Favorskaya, M.N., Jain, L.C. (eds.) Computer Vision in Advanced Control Systems-5. ISRL, vol. 175, pp. 265–294. Springer, Cham (2020) 20. Green, O.: Efficient scalable median filtering using histogram-based operations. IEEE Trans. Imag. Process. 27(5), 2217–2228 (2018) 21. Zotin, A.: Fast Algorithm of Image Enhancement based on Multi-Scale Retinex. Procedia Comput. Sci. 131, 6–9 (2018) 22. Yuan, X., Wu, L., Peng, Q.: An improved Otsu method using the weighted object variance for defect detection. Appl. Surface Sci. 349, 472–484 (2015)
High-Dimensional Data Analysis and Its Applications
Symbolic Music Text Fingerprinting: Automatic Identification of Musical Scores Michele Della Ventura
Abstract The explosion of information made available by the Internet requires continuous improvement/enhancement of the search engines. The heterogeneity of information requires the development of specific tools depending on whether it is text, image, audio, etc. One of the areas considered insufficiently by the researchers concerns the search for musical scores. This paper aims to presents a method able to identify the fingerprint of a musical score considered in its symbolic level: it is a compact representation that contains specific information of the score that permits to differentiate it from other scores. A Musical Score Search Engine (MSSE), able to use the fingerprint method to identify a musical score in a repository, has been created. The logic of operation is presented along with the results obtained from the analysis of different musical scores.
1 Introduction The explosion of information made available by the Internet requires continuous enhancement of the search engines in order to satisfy the requests of the users. Information retrieval (IR) is a field of study dealing with the representation, storage, organization of, and access to documents. The documents may be books, images, videos, or multimedia files. The whole point of an IR system is to provide a user easy access to documents containing the desired information. Given the complexity and heterogeneity of the information contained on the web [1], search engines are becoming crucial to allow easy navigation through the data. With a view to facilitate access to data, the information retrieval system builds synthetic representations of the documents (auxiliary data) through an indexation procedure [2]: the idea is to associate to every document a set of significant terms that are going to be used in order to select the document [3, 4]. M. Della Ventura (B) Department of Information Technology, Music Academy “Studio Musica”, Via Andrea Gritti, 25, 31100 Treviso, Italy e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_22
255
256
M. Della Ventura
In the textual IR, the indexes may be [5]: • words automatically extracted from the document; • roots of words (for instance, house) automatically extracted from the document; • phrases (for instance, “textual musical analysis”) automatically extracted from the document; • words (or phrases) extracted from a controlled vocabulary; • (additionally) metadata (for instance, title, authors, creation data, and so on). From these considerations, one can immediately infer that, while in the case of a linguistic text, it is easy to create indexes [6, 7], in the case of a musical language, significant difficulties emerge. As far as a musical piece is concerned, the indexes are created exclusively in reference to the title, the name of the author, the tonality, and other information of a purely textual and informative kind [8]. Recent research analyzed the audio files, creating audio search engines, through the audio fingerprint technique that may be used not only to identify an audio file [8, 9] but also to synchronize multiple audio files [10, 11]. However, in the case of a musical text examined at a symbolic level (i.e., the musical score), it becomes difficult to create indexes in the absence of specific indications that often lead to approximate results. There are many scientific researches in the ambit of the musical text that have the objective of searching for more accurate systems in order to determine the identifying elements of a composition, as for instance, a melody, a motif, a rhythmic structure [12–14] and so forth, that might be used in order to create an indexation of the same composition This document presents a method for the analysis of a musical score (considered in its symbolic level) in order to extrapolate a fingerprint that permits to identify the score in a database by a MSSE. The method is based on the creation of a representative vector of a musical score derived from a transition matrix that considers the manner in which the sounds succeed one another inside the musical piece. This paper is organized as follows. Section 2 analyzes the concept of “sound” and its characteristics. Section 3 explains the method used to obtain the Fingerprint of a musical score. Available experimental results shown illustrate the effectiveness of the proposed method. Finally, Sect. 5 concludes this paper with a brief discussion.
2 The Concepts of Sound Melody and rhythm are two fundamental components as far as musical structuring is concerned, two nearly inseparable components: a melody evolves along with the rhythm in the absence of which it does not exist [15]: “melody in itself is weak and quiescent, but when it is joined together with rhythm it becomes alive and active” [16] (Fig. 1).
Symbolic Music Text Fingerprinting: Automatic Identification …
257
Fig. 1 Excerpt from the score of Ravel’s “Bolero.” The initial notes of the theme are represented on the first staff without any indication with respect to rhythm; the same notes are represented of the second staff together with the rhythm assigned by the composer
2.1 The Melody The melody of a musical piece is represented by a number of sounds, each one separated from the next by a number of semitones: the melodic interval. The various melodic intervals were classified as symbols of the alphabet [17]. The classification of an interval consists of the denomination (generic indication) and in the qualification (specific indication) [18, 19]. The denomination corresponds to the number of degrees that the interval includes, calculated from the lowest one to the highest one; it may be of 2nd , 3 rd , 4th , 5th , and so on.; the qualification is deduced from the number of tones and semi-tones that the interval contains; it may be: perfect (G), major (M), minor (m), augmented (A), diminished (d), more than augmented (A +), more than diminished (d-), exceeding (E), deficient (def). A melody is usually represented as a sequence S i of N intervals nx indexed on the basis of their order of occurrence x [17]: Si = (n x ) x∈[0,N −1] The musical segment may, therefore, be seen as a vector, the elements of which are, respectively, the intervals that separate the various sounds from one another. The corresponding value of every interval equals the number of semi-tones between the i-th note and the preceding one: this value will be, respectively, positive or negative depending on whether the note is higher or lower than the preceding note (Fig. 2) [17]. Fig. 2 Melodic segment and its related vector
Si =
258
M. Della Ventura
2.2 The Rhythm The rhythm is associated with the duration of the sounds: duration intended as the time interval in which sound becomes perceptible, regardless of whether it is due to a single sign or several signs joined together by a value connection [20–22]. If we were to analyze a score, the sound duration will not be expressed in seconds but calculated on the basis of the musical sign (be it sound or rest) with the smallest duration existing in the musical piece [17]. The duration of every single sign will, therefore, be a (integer) number directly proportional to the smallest duration. In the example shown in Fig. 3, the smallest duration sign is represented by the 30-second note to which the value 1 is associated (automatically): it follows that the 16th note shall have the value 2, the 8th note the value 4. On the base of all the above considerations related to the concept of melody, it is possible to deduce that each melody has a succession of sounds that differentiates it from other melodies (i.e., there are different intervals that separate sounds) and each sound has a specific duration that confers meaning to the whole melody. Figure 4 shows two incipits derived from two different songs that have the same sounds but different rhythm.
3 The Fingerprint of a Musical Score The objective of this research is to define a fingerprint of a musical score in order to identify it within a database, starting from a segment of notes. A fingerprint of a musical score, considered in its symbolic level, is a compact representation that contains specific information of the score that permits to differentiate it from other musical scores. On the base of the considerations of the previous paragraphs, the most important characteristics of a musical piece are: (1) the interval that separates a sound from the next one, (2) the duration of each sound.
Ri= Fig. 3 Rhythmic segment and its related vector
Fig. 4 a “Frere Jacque” (French popular song), b “DO-RE-MI” (from the Musical “The sound of Music”)
Symbolic Music Text Fingerprinting: Automatic Identification …
259
The way in which the interval follows one another within the musical piece must be taken into consideration. To do so, the Markov Process (or Markov Stochastic Process—MSP) is used: the choice was made to describe the passage from one sound to the next sound considering the number of semitones between the two sounds (melodic interval), the trend of the interval (a = ascending or d = descending) and the duration of the two sounds. Table 1 shows an excerpt of a transition matrix: the first column (and first row) indicates the denomination of the interval (classification), the second column (and the second row) presents the number of semi-tones that make up the interval (qualification), the third column (and the third row) displays the ascending (a) or descending (d) movement between two consecutive sounds, and the fourth column (and fourth row) presents the time–space between two consecutive sounds. For every single musical score a table, which represents its own alphabet, is filled in. The type of intervals (column and row 1) depends on the musical score: this represents a further discriminating element regarding the definition of the fingerprint. At the end of this procedure, in order to obtain the representative vector of the score (fingerprint), the presence or absence in a row of values representing the transitions sounds must be indicated with binary digits 1 and 0 (see the last column in Table 2, titled Vector). Table 1 Example of a transitions matrix Interval semitones trend
1
1
1
1
a
d
a
d
duration 1
a
1
d
1
a
1
d
Table 2 Fingerprint of the example of Fig. 1 Interval
m 1
m 1
m 1
m 1
M 2
2 M 2
M 2
M 2
m 3
m 3
m 3
m 3
M 4
3 M 4
M 4
M 4
a
d
a
d
a
d
a
d
a
d
a
d
a
d
a
d Vector
2 m
1
a
1
d
0
1
a
0
1
d
0
2
a
2
d
1
1
1
1
1
1 0
260
M. Della Ventura
Given a musical segment S 2 of length N 1 < N 0 , where N 0 is the length of the music score, it is necessary to define its representative vector following the procedure described above, and compare it with the representative vector of the music score. The comparison takes place by making a bit-to-bit difference, in correspondence with the bits with value 1 of the segment S 2 : if the resulting difference is zero for each bit, the score was identified (Fig. 5). In the example shown in Fig. 5, there are the fingerprints of the songs “Frère Jacque” and “DO-RE-MI” compared with the same musical segment S 2 . It is possible to observe that S 2 (represented by 2 bits) is identified in the song DO-RE-MI because the bit-to-bit differences are zero in both the bits. When the bit-to-bit difference is the same among multiple musical scores, the choice is made considering the musical score with the greatest number of zeros (which means a greater number of common elements between the musical segment S 2 and the musical sores) (see Fig. 6). In this case, the concept of similarity is taken into consideration, as a discriminating factor.
Musical segment S2 Fingerprint
0
1
1
0
0
0
0
0
0
0
1
1
1
0
0
Vector S2
1
1
Check
0
-1
0
0
0
0
0
DO-RE-MI Fingerprint
1
1
0
1
1
0
0
0
0
1
1
1
1
1
1
Vector S2
1
1
Check
0
0
0
0
1
0
1
0
0
1
0
1
0
0
1
0
1
Fig. 5 Musical score identification Musical Score 1 Fingerprint
0
1
Vector S2
1
1
0
1
1
0
1
0
0
0
1
1
1
1
1
1
1
Check
-1
-1
0
0
Musical Score 2 Fingerprint
0
1
Vector S2
1
1
0
1
1
0
1
1
0
0
1
1
1
1
1
1
1
Check
0
-1
0
0
Fig. 6 Musical score identification: the musical Score 2 has three common elements with the Segment S2 while the Musical Score 1 has two common elements with the same Segment S2
Symbolic Music Text Fingerprinting: Automatic Identification …
261
4 Application and Analysis: Obtained Results The aim of this research was to develop a method in order to realize a MSSE, able to use the fingerprint method to identify a musical score in a database. The analysis method described in this paper was tested using a specially developed fingerprint extraction algorithm. This fingerprint extraction process used a unique musical score representative matrix that took into considerations: • the musical intervals between the semitone and the octave (12 semitones), • the duration of the sounds between the “64th note” (that assumed the value 1) and the “breve” (that assumed the value 128). Section 4.1 describes the fingerprints extraction process: from reading the score from the file to compiling the musical score representative matrix in order to obtain the representative vector of the digital imprint. The fingerprints are stored as records in a binary file, which is indexed by a data structure for subsequent search queries. This stage of the method is described in Sect. 4.2. Finally, Sect. 4.3 illustrates the results obtained, comparing sample musical segments with the database of stored digital fingerprints.
4.1 Features Extraction For the test, a database with 200 musical scores was selected of different authors of the eighteenth (100 scores) and nineteenth centuries (100 scores). This choice was done to avoid the presence of big musical intervals and of the polyrhythm typical of the compositions of the twentieth century: in this way, it was possible to reduce the dimensions of the score representative matrix and consequently the time of the analysis. All of the musical scores were stored in MIDI format (a symbolic music interchange format) or in the more recent MusicXML file (a text-based language that permits to represent common Western musical notation). The feature extraction is highlighted in Fig. 7. Firstly, in the decoding stage, all musical scores were read from the files and transformed in a list of numbers each of which corresponded to a sound based on its pitch and its duration [23, 24]. It is important to highlight the fact that in the case of two or more sounds linked together, these are considered as a single sound whose duration is equal to the sum of the durations of the individual sounds [17] (see example in Fig. 8). Secondly, the intervals between the sounds and the durations were extracted in order to fill in the transition matrix (MSP) to obtain the fingerprints [17].
262
M. Della Ventura
Fig. 7 Features extraction
Fig. 8 Sound duration duration
2
1
Ri=
4
2
1
Ri=
4.2 Storing Fingerprints Once the musical score representative matrix is filled in, its binary fingerprint representative vector is extracted: the presence or absence in a row (of the musical score representative matrix) of values representing the transitions sounds must be indicated with binary digits 1 and 0. The extraction of the vector is performed considering the musical intervals between the semitone and the octave, namely 12 different intervals, and for each interval considering the duration of the sounds between the “64th note” and the “breve,” namely eight different durations. The vector of each musical score consists of 96 bits and it is stored in a database and indexed with the musical score for subsequent fast search queries. The search for a score within the database is carried out by writing the sounds of a sample musical segment (with their respective durations) within a dialog box similar to the dialog box of a search engine, with the difference that this is represented by a musical stave (Fig. 9).
Symbolic Music Text Fingerprinting: Automatic Identification …
263
Fig. 9 MSSE interface
4.3 Experiment and Results The initial tests were carried out on a set of sample musical segments of two different lengths, five and ten musical notes. For the tests, the processing time was not taken into consideration, because it was strictly connected to the type of computer used for the analysis. It was not important that the durations of the sounds indicated in the sample were the same as the durations present in the scores: the durations could be different under the condition that the mathematical proportion was always respected (Fig. 10). The results of the tests are shown in Table 3. It is possible to notice that the results are satisfactory considering both five-note and ten-note segments. In the last case, the average is slightly lower than that the average of the five-note segments. This is due to the fact that a sequence of a few musical notes is more likely to be present in a score, especially if the musical score is very long. This means that, for a satisfactory research, it is necessary to define a minimum number of mandatory musical notes. Not all sample segments have been associated with a specific musical score. In some cases, a sample segment did not provide any musical scores as a result, due to the fact that the duration of the sounds between the sample segment and the musical score did not coincide. This (apparent) problem could be solved by making the analysis algorithm read a large number of musical scores in order to automatically resize (selflearning algorithm) the musical score representative matrix (and therefore without Fig. 10 Proportional duration Ri=
Ri=
Table 3 Comparative performance under different length of the sample musical segment
Period of the musical scores
Averaged recognition rate (%) 5 musical notes
10 musical notes
18th century
87
92
19th century
84
91
264
M. Della Ventura
defining it a priori), based on the different durations of the sounds read in the same musical score. In this way the size of the matrix, the size of the vector representing the fingerprint, and the time of analysis increase, to the advantage of a better result.
5 Discussion and Conclusions It has been shown throughout this paper that modeling the fingerprint of a musical score using the parameters characterizing the sound can form the basis of effective algorithms for retrieving documents from a database. The heterogeneity of information and the consequent optimization of the search engines demonstrate the importance to develop new methods and their relevance even for educational purposes. The approach has been tested on a set of 200 musical scores of different authors. Moreover, qualitative analysis has been carried out on the relationships between the symbolic level of the music scores and the characteristics of the sound. Results are encouraging, both in terms of average precision of the retrieval results and in terms of musicological significance. Future research will focus on understanding how the concept of similarity should be considered in creating the fingerprint in order to identify the musical score. Starting from this point, it is hoped that future research paths can be developed that can lead to further improvement in this area.
References 1. Qiu, T., Chen, N., Li, K., Atiquzzaman, M., Zhao, W.: How can heterogeneous internet of things build our future: a survey. Published in: IEEE Communications Surveys & Tutorials, vol. 20, Issue: 3, thirdquarter( 2018) 2. Crestani, F., Rijsbergen, C.J.: J. Intell. Inf. Syst. 8, 29 (1997). https://doi.org/10.1023/A:100 8601616486 3. Blummer, B., Kenton, J.M.: Information Research and the Search Process, Improving Student Information Search, pp. 11–21. Chandos Publishing (2014) 4. Boubekeur, F., Azzoug, W.: Concept-based indexing in the text information retrieval. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 5(1), (2013) 5. Bruce Croft, W., Metzler, D., Strohman, T.: Search engines, information retrieval in practice. Pearson Education, Inc (2015) 6. Chang, G., Healey, M.J., McHugh, J.A.M., Wang, J.T.L.: Multimedia search engines. In: Mining the World Wide Web. The Information Retrieval Series, vol 10. Springer, Boston, MA (2001) 7. Jeon, J., Croft, W.B., Lee, J.H., Park, S.: A framework to predictthe quality of answers with non-textual features. InSIGIR’06: Proceedingsof the 29th Annual International ACM SIGIR Conference On Research Anddevelopment In Information Retrieval, pp. 228–235. ACM (2006) 8. Mau, T.N., Inoguchi, Y.: Audio fingerprint hierarchy searching strategies on GPGPU massively parallel computer. J. Inf. Telecommun. 2(3), 265–290 (2018) 9. Yang, F., Yukinori, S., Yiyu, T., Inoguchi, Y.: Searching acceleration for audio fingerprinting system. In: Joint Conference of Hokuriku Chapters of Electrical Societies (2012)
Symbolic Music Text Fingerprinting: Automatic Identification …
265
10. Mau, T.N., Inoguchi, Y.: Robust optimization for audio fingerprint hierarchy searching on massively parallel with multi-GPGPUS using K-modes and LSH. In International conference on advanced engineering theory and applications, pp. 74–84. Springer, Cham (2016a) 11. Mau, T.N., Inoguchi, Y.: Audio fingerprint hierarchy searching on massively parallel with multi-GPGPUS using K-modes and LSH. In Eighth International Conference on Knowledge and Systems Engineering (KSE), pp. 49–54. IEEE (2016b) 12. Della Ventura, M.: Musical DNA. ABEditore, Milano (2018). ISBN: 978-88-6551-281-4 13. Neve, G., Orio, N.: A comparison of melodic segmentation techniques for music information retrieval. In: Rauber A., Christodoulakis S., Tjoa A.M. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2005. Lecture Notes in Computer Science, vol 3652. Springer, Berlin (2005) 14. Lopez, R.M.E.: Automatic Melody Segmentation. Ph.D. thesis, UtrechtUniversity (2016) 15. Fraisse, P.: Psychologie du rythme, Puf, Paris (1974) 16. Fraisse, P.: Les structures rythmiques. Erasme, Paris (1958) 17. Ventura, D.M.: The influence of the rhythm with the pitch on melodic segmentation. In: Proceedings of the Second Euro-China Conference on Intelligent Data Analysis and Applications (ECC 2015). Springer, Ostrava, Czech Republic (2015) 18. de la Motte, D.: Manuale di armonia, Bärenreiter (1976) 19. Schoenberg, A.: Theory and Harmony. Univ of California Pr; Reprint edition (1992) 20. Moles, A.: Teorie de l’information et Perception esthetique. Flammarion Editeur, Paris (1958) 21. Peeters, G., Deruty, E.: Is music structure annotation multi-dimensional? A proposalfor robust local music annotation.In: Proceedings of the International Workshop on Learning the Semantics of Audio Signals (LSAS), Graz, Austria (2009) 22. Iroro, F., Ohoroe, O., Chair, L., Hany, F.: Riddim: A rhythm analysis and decomposition tool based on independent subspace analysis (2002) 23. Madsen, S., Widmer, G.: Separating voices in MIDI. ISMIR, Canada (2006) 24. Ventura, D.M.: Using mathematical tools to reduce the combinatorial explosion during the automatic segmentation of the symbolic musical text, In Proceedings of the 4th International Conference on Computer Science, Applied Mathematics and Applications. Vienna, Austria, Springer (2016)
Optimization of Generalized C p Criterion for Selecting Ridge Parameters in Generalized Ridge Regression Mineaki Ohishi , Hirokazu Yanagihara, and Hirofumi Wakaki
Abstract In a generalized ridge (GR) regression, since a GR estimator (GRE) depends on ridge parameters, it is important to select those parameters appropriately. Ridge parameters selected by minimizing the generalized C p (GC p ) criterion can be obtained as closed forms. However, because the ridge parameters selected by this criterion depend on a nonnegative value α expressing the strength of the penalty for model complexity, α must be optimized for obtaining a better GRE. In this paper, we propose a method for optimizing α in the GC p criterion using [12] as similar as [7].
1 Introduction In this paper, we deal with the following normal linear regression model: y = (y1 , . . . , yn ) ∼ Nn (η, σ 2 I n ), η = (η1 , . . . , ηn ) = μ1n + Xβ, where σ 2 is an unknown variance, μ is an unknown location parameter, 1n is an n-dimensional vector of ones, X is an n × k matrix of nonstochastic explanatory variables satisfying rank(X) = k < n − 3 and X 1n = 0k , and β is a k-dimensional vector of unknown regression coefficients. Here, 0k is a k-dimensional vector of zeros. The least square method is one of the most fundamental estimation methods for unknown parameters μ and β, and these least square estimators (LSEs) are given as μˆ = y¯ =
1 1 y, βˆ = (X X)−1 X y = M −1 X y, n n
(1)
M. Ohishi (B) · H. Yanagihara · H. Wakaki Hiroshima University, Higashi-Hiroshima 739-8526, Japan e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_23
267
268
M. Ohishi et al.
where M = X X. Although μˆ and βˆ are given in simple forms in (1), it is well known that the LSEs have desirable properties, e.g., unbiasedness and normality. However, βˆ is not a good estimator in the sense that its variance becomes large when multicollinearity occurs among several explanatory variables. To improve the brittleness of the LSE, [5] proposed a generalized ridge (GR) regression, i.e., a shrinkage estimation of β by k ridge parameters. However, since different ridge parameters give different estimates of β in a GR regression, it is very important to select ridge parameters appropriately. One way of choosing ridge parameters is to employ a model selection criterion (MSC) minimization method. A generalized C p (GC p ) criterion (see [1]) and a GCV criterion (see [2]) are well-known MSCs, and for example, a GC p criterion minimization method (see [8]) and a GCV criterion minimization method (see [13]) have been proposed. In addition, a fast minimizing algorithm of an MSC in a general framework that includes these criteria (see [10]) has also been proposed. Although ridge parameters selected by these methods are obtained as closed forms, in practical terms, most of these methods require a numerical search. However, ridge parameters selected by the GC p criterion minimization method can be obtained without a numerical search. Hence, using the GC p criterion is suitable to select ridge parameters. The selection problem of ridge parameters can be solved by the GC p criterion minimization method, but the method has another problem. The GC p criterion depends on α expressing the strength of the penalty for model complexity and, as such, it can express various MSCs by changing α, for example, the C p criterion (see [6]) and the MC p criterion (see [4, 14]). Although C p and MC p criteria are estimators of predictive mean square error (PMSE), ridge parameters selected by these criteria minimization methods do not always minimize PMSE. Hence, for obtained data, α must be optimized based on minimizing the PMSE. In this paper, we propose a method for optimizing α in the GC p criterion without a numerical optimization search across all regions. Similar to the method in [7], we use a naive estimator of the PMSE of the predictive value based on ridge parameters selected by the GC p criterion minimization method as a criterion for optimizing α. The naive estimator is derived using a Stein’s lemma (see [12]) as with [3] and [15]. Moreover, we develop an algorithm for effectively calculating the optimal α by restricting candidates of the optimal α to a finite number. The remainder of the paper is organized as follows: As preparation, Sect. 2 describes ridge parameters selected by the GC p criterion minimization method and a predictive value based on the selected ridge parameters. Section 3 presents a criterion for optimizing α in the GC p criterion and an algorithm for obtaining the optimal α. In Sect. 4, prediction accuracies of the predictive values obtained from the GRE are compared by a Monte Carlo simulation. Technical details are provided in the Appendix.
Optimization of Generalized Cp Criterion for Selecting Ridge Parameters …
269
2 Preliminaries We express a singular value decomposition of X as follows: 1/2 D Q = P 1 D1/2 Q , X=P O n−k,k
(2)
where P and Q are n × n and k × k orthogonal matrices, respectively, D is the diagonal matrix defined by D = diag(d1 , . . . , dk ) of which diagonal elements satisfy d1 ≥ · · · ≥ dk > 0 since rank(X) = k, O n,k is an n × k matrix of zeros, and P 1 is the n × k matrix derived from the partition P = ( P 1 , P 2 ). Note that P 1 P 1 = I k . Then, GR estimators (GREs) are given by μˆ = y¯ , βˆ θ = (M + QΘ Q )−1 X y = M −1 θ X y,
(3)
where θ is a k-dimensional vector of k ridge parameters of which the jth element satisfies θ j ∈ R+ = {θ ∈ R | θ ≥ 0}, Θ is a diagonal matrix defined by Θ = ˆ Using diag(θ1 , . . . , θk ), and M θ = M + QΘ Q . Since M 0k = M, we have βˆ 0k = β. the GREs in (3), a predictive value of y is given by ˆyθ = μ1 ˆ n + X βˆ θ = ( J n + X M −1 θ X ) y = H θ y,
(4)
where J n = 1n 1n /n and H θ = J n + X M −1 θ X . As per (3) and (4), since the GREs depend on ridge parameters θ , it is important to select θ . The GC p criterion for selecting ridge parameters is defined by
GC p (θ | α) =
1 y − ˆyθ 2 + 2α tr(H θ ), s2
where s 2 is an unbiased estimator for σ 2 defined by s2 =
y (I n − J n − X M −1 X ) y , n−k−1
(5)
and α ∈ R+ expresses the strength of the penalty for model complexity tr(H θ ). By changing α, the GC p criterion can express various MSCs, for example, the C p criterion when α = 1 and the MC p criterion when α = c M = 1 + 2/(n − k − 3). In this paper, s 2 > 0 holds from the assumption k < n − 3. Ridge parameters selected by the GC p criterion minimization method are given by ˆ α = diag θˆα,1 , . . . , θˆα,k , θˆ α = θˆα,1 , . . . , θˆα,k = arg min GC p (θ | α), Θ θ ∈Rk+
and the selected ridge parameters are obtained as the following closed forms (e.g., see [10]):
270
M. Ohishi et al.
θˆα, j
⎧ αs 2 d j ⎪ 2 2 ⎨ α < z /s j 2 2 = z j − αs , ⎪ ⎩∞ α ≥ z 2j /s 2
(6)
where z j is defined by z = (z 1 , . . . , z k ) = P 1 y.
(7)
Let M α = M θˆ α and H α = H θˆ α . Then, selected ridge parameters in (6) give the GRE of β and the predictive value based on the GC p criterion minimization as follows: ˆ β(α) = M −1 α X y,
ˆy(α) = H α y.
(8)
The GC p criterion minimization method gives the GRE and the predictive value based on the selected ridge parameters in closed form as (8). However, since such ridge parameters include ∞ as (6), in practical terms, (8) is not useful. Then, we rewrite (8) in a form without ∞. Let V α be a diagonal matrix defined by V α = diag(vα,1 , . . . , vα,k ) of which the jth diagonal element is defined by
vα, j = I α
φa+1 (ta+1 ) holds for a = 0, . . . , k − 1. Therefore, candidate points of the minimizer of C(α) are restricted to {t0 , . . . , tk } and consequently Theorem 1 is proved.
Appendix 4: Proof of Lemma 2 For any j ∈ {1, . . . , k}, uˆ j (α) is given by
uˆ j (α) = vα, j u j = u j I α < u 2j /w 2 1 − αw 2 /u 2j , where w 2 = s 2 /σ 2 . An almost differentiability for uˆ j (α) ( j ∈ {1, . . . , k}) holds by the following lemma (the proof is given in Appendix 5). Lemma 3 Let f (x) = x I(c2 < x 2 )(1 − c2 /x 2 ), where c is a positive constant. Then, f (x) is almost differentiable. Next, we derive a partial derivative of uˆ j (α) ( j ∈ {1, . . . , k}). Notice that ∂w 2 /∂u j = 0 for any j ∈ {1, . . . , k} since w 2 is independent of u 1 , . . . , u k . Hence, we have w2
∂ vα, j = 2α I α < u 2j /w 2 3 . ∂u j uj
Optimization of Generalized Cp Criterion for Selecting Ridge Parameters …
277
Since ∂ uˆ j (α)/∂u j = {∂vα, j /∂u j }u j + vα, j and u 2j /w 2 = z 2j /s 2 , we obtain (17). Moreover, the expectation of (17) is finite because 0 ≤ I(α < z 2j /s 2 ) (1 + αs 2 /z 2j ) < 2. Consequently, Lemma 2 is proved.
Appendix 5: Proof of Lemma 3 It is clear that f (x) is differentiable on R = R\{−c, c}. If x 2 > c2 , f (x) = 1 + c2 /x 2 and supx:x 2 >c2 | f (x)| = 2 since f (x) is decreasing as a function of x 2 . Hence, f (x) is bounded on R and we have | f (x) − f (y)|/|x − y| < 2 when x and y are both in (−∞, −c), [−c, c], or (c, ∞). In addition, f (x) is an odd function. Thus, it is sufficient to show that { f (x) − f (y)}/(x − y) is bounded for (x, y) such that x > c and y ≤ c for Lipschitz continuity. For such conditions, the following equation holds. ⎧ c2 ⎪ ⎪ ⎨ 1 + 0. Then, we estimate β 1 and β 2 by minimizing the following PRSS, which corresponds to that of partial generalized ridge regression: 2 PRSSPR (β 1 , β 2 |) = y − W 1 β 1 − W 2 β 2 + β 2 Q Q β 2 . Since W 2 (In − PW 1 )W 2 + Q Q = Q(D + )Q , estimators of β 1 and β 2 are derived as βˆ 1, = (W 1 W 1 )−1 W 1 ( y − W 2 βˆ 2, ), βˆ 2, = {W 2 (I n − P W 1 )W 2 + Q Q }−1 W 2 (I n − P W 1 ) y.
(7)
A Fast Optimization Method for Additive Model via Partial …
283
These imply that a hat matrix of this model can be written as H = P W 1 + (I n − P W 1 )W 2 Q( D + )−1 Q W 2 (I n − P W 1 ).
(8)
Here, H is expressed as H when = O k,k .
2.2 Optimization Method for Smoothing Parameters From the general formula of [2], our GCV criterion for optimizing the smoothing parameters λ1 , . . . , λk is given by GCV() =
(I n − H ) y2 . n {1 − tr(H )/n}2
(9)
Hence, the smoothing parameters are optimized by minimizing (9) with the method of [12]. Let z = (z 1 , . . . , z k ) be a k-dimensional vector defined as z = (z 1 , . . . , z k ) = D−1/2 Q W 2 (I n − P W 1 ) y,
(10)
and u j ( j = 1, . . . , k) be the jth-order statistic of z 12 , . . . , z k2 , i.e., uj =
2 2 min{z 1 ,2. . . , z k }2 ( j = 1) . min {z 1 , . . . , z k }\{u 1 , . . . , u j−1 } ( j = 2, . . . , k)
From u 1 , . . . , u k , the following estimators for variance, and ranges are obtained: y (I n − H) y +
a
j=1 u j (a = 0, 1, . . . , k), n − 3p − k − 1 + a ⎧ (a = 0) ⎨ (0, u 1 ] Ra = (u a , u a+1 ] (a = 1, . . . , k − 1) , ⎩ (u k , ∞) (a = k)
sa2
=
(11) (12)
where 0j=1 u j = 0. Then, closed-form expressions for the optimal values of λ1 , . . . , λk that minimize the GCV criterion are given by (the proof is given in the Appendix) ⎧ ⎨
∞ (sa2∗ ≥ z 2j ) dj λˆ j = (s 2 < z 2j ) ⎩ 2 2 z j /sa∗ − 1 a∗
( j = 1, . . . , k),
(13)
where a∗ ∈ {0, 1, . . . , k} is the unique integer satisfying sa2∗ ∈ Ra∗ (The proof is shown in [12] ). Let V be a k × k diagonal matrix defined as
284
K. Fukui et al.
V = diag(v1 , . . . , vk ), v j = I
z 2j
>
sa2∗
1−
sa2∗
z 2j
,
where I (A) is the indicator function, i.e., I (A) = 1 if A is true and I (A) = 0 if A ˆ −1 = V D−1 since D and V are diagonal matrices. is not true. We can see ( D + ) Then, we have
ˆ Q W 2 (I n − P W 1 )W 2 + Q
−1
= QV D−1 Q ,
ˆ = diag(λˆ 1 , . . . , λˆ k ). Therefore, after optimization of , the βˆ 1, and βˆ 2, where in (7) become βˆ 1,ˆ = (W 1 W 1 )−1 W 1 y − W 2 βˆ 2,ˆ , βˆ 2,ˆ = QV D−1 Q W 2 (I n − P W 1 ) y.
3 Numerical Study In this section, we evaluate our new method by conducting a simulation study. The p simulation data were generated from yi = j=1 μ∗j (xi j ) + εi (i = 1, . . . , n) with n = 300, 500, 1000, and p = 4 and 8. Let us define three functions η1 (x), η2 (x) and η4 (x) as sin(12x + 0.2) (see [3]), ⎧ x + 0.2 ⎨ −60(x − 17/60)2 + 16/15 (x < 1/4) (1/4 ≤ x < 3/4) , η2 (x) = 4x ⎩ 80(x − 29/40)2 + 59/20 (3/4 ≤ x) η1 (x) =
η4 (x) = 8 {1.5φ((x − 0.35)/0.15) − 1.5φ(x − 0.8)/0.04)} (see [8]), where φ(x) is the probability density function of the standard normal distribution. The following eight functions, with curves as depicted in Fig. 1, were used as the true trend μ∗j (x): Trend 1: Trend 2: Trend 3: Trend 4: Trend 5: Trend 6: Trend 7: Trend 8:
μ∗1 (x) = η1 (x) − η1 (0). μ∗2 (x) = η2 (x) − η2 (0). μ∗3 (x) = 6x. μ∗4 (x) = η4 (x) − η4 (0). μ∗5 (x) = −6 exp(−3x) sin(3πx). μ∗6 (x) = 3 sin x − 48x + 218x 2 − 315x 3 + 145x 4 . μ∗7 (x) = 8x 2 − 4x. μ∗8 (x) = −4x.
A Fast Optimization Method for Additive Model via Partial … Trend1
Trend2
285
Trend3
Trend4
6
2
2.5
10
1
4
0
0.0 5
−1
2
−2.5
−2 0 0.00
0.25
0.50
0.75
1.00
−5.0
0 0.00
0.25
0.50
0.75
1.00
0.00
Trend6
Trend5
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.75
1.00
Trend8
Trend7 4
0
3
−1
2 0
1
−2
0
2
−1
1
−2
−3 −2 −4
0.00
0.25
0.50
0.75
1.00
−3
0 −4 0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
0.75
1.00
0.00
0.25
0.50
Fig. 1 Curves of μ∗j (x), j = 1, . . . , 8. The y-axis is each μ∗j (x) Table 1 Relative MSE and runtime (TIME) of the new method p=4 p=8 σ 2 = 0.5 σ2 = 1 σ 2 = 2.0 σ 2 = 0.5 σ2 = 1 σ 2 = 2.0 n MSE TIME MSE TIME MSE TIME MSE TIME MSE TIME MSE TIME 300 500 1000
17.9 20.4 25.6
32.5 16.1 69.2
30.2 28.5 30.9
1.1 42.7 66.8
49.4 43.0 40.1
0.6 26.4 40.3
39.4 32.2 31.9
4.4 11.0 49.5
63.8 49.4 42.3
8.4 4.8 34.0
96.0 75.1 58.3
6.8 10.8 30.8
Trends 1, 2, 3, and 4 are functions that were used by [11]. The explanatory variables xi j were generated independently from the uniform distribution U (0, 1). The error variables ε1 , . . . , εn were independently generated from the normal distribution N (0, σ 2 ) with σ 2 = 0.5, 1.0 and 2.0. We applied two optimization methods–our new method, as described in Sect. 2, and the back-fitting algorithm, as described in Sect. 1. B 1 , . . . , B p were set to the same matrices as those in the previous section, a knot-placement was established in the equipotent arrangement, i.e., τ j,q is a q/(k j + 1)-quantile of x j1 , . . . , x jn (q = 1, . . . , k j ), and the number of basis functions was fixed at 10, i.e., k j = 10. Then, we calculated the mean square error (MSE) of the estimated trends as ⎫2 ⎤ ⎡⎧ p 100 ⎬ 1 ⎣⎨ ⎦, μˆ j (ts j ) − μ∗j (ts j ) E μˆ 0 + ⎩ ⎭ 100 s=1 j=1 where ts j = mini=1,...,n xi j + (s − 1) × (maxi=1,...,n xi j − mini=1,...,n xi j )/99 (s = 1, . . . , 100). The above MSE was evaluated by Monte Carlo simulation with 10,000 iterations.
286
K. Fukui et al.
Table 1 shows the relative MSEs and relative runtimes (TIME) for the new method, which are expressed as percentages where those of the back-fitting algorithm are 100 From Table 1 , the proposed method achieved higher performance with respect to both predictive accuracy and computational efficiency. That is, when the numericals in Table 1 are smaller than 100, our method is better than the ordinary method. Hence, our new method will be more suitable to apply to the additive model.
Appendix: The Proof of Equation (13) A singular value decomposition of (I n − P W 1 )W 2 is expressed as (I n − P W 1 )W 2 = G
D1/2 O n−k,k
Q = G 1 D1/2 Q ,
(14)
where G is an n × n orthogonal matrix and G 1 is an n × k matrix derived from the partition G = (G 1 , G 2 ). Note that D1/2 is diagonal matrix. From Eqs. (8) and (14), we can see that ( D + )−1 D O k,n−k G. H = P W1 + G O n−k,k O n−k,n−k Hence, tr(H ) can be calculated as tr(H ) = tr( P W 1 ) + tr{( D + )−1 D} = 3 p + k + 1 −
k j=1
λj . dj + λj
(15)
Notice that G 1 = (I n − P W 1 )W 2 Q D−1/2 from (14), and I n = GG = G 1 G 1 + G 2 G 2 . Hence, we have G 1 (I n − P W 1 ) y = z and y (I n − P W 1 )G 2 G 2 (I n − P W 1 ) y = y (I n − P W 1 )(I n − G 1 G 1 )(I n − P W 1 ) y = y I n − P W 1 − (I n − P W 1 )W 2 Q D−1 Q W 2 (I n − P W 1 ) y = y (I n − H) y = (n − 3 p − k − 1)s02 . Using the above results and noting PW1 P1 = On,k , we can derive the following equation:
A Fast Optimization Method for Additive Model via Partial …
287
(I n − H ) y2 = y (I n − H )2 y
2 ( D + )−1 D O k,n−k G (I n − P W 1 ) y = y (I n − P W 1 )G I n − O n−k,k O n−k,n−k
= z {I k − ( D + )−1 D}2 z + (n − 3 p − k − 1)s02 2 k λj = (n − 3 p − k − 1)s02 + zj . λj + dj j=1 By substituting (15) and above result into (9), we can obtain (n − 3 p − k − 1)s02 + kj=1 {λ j z j /(λ j + d j )}2 GCV() = . n[1 − {3 p + k + 1 − kj=1 λ j /(λ j + d j )}/n]2
(16)
Let δ = (δ1 , . . . , δk ) be a k-dimensional vector defined as δj =
λj . dj + λj
Note that δ j ∈ [0, 1] since d j ≥ 0 and λ j ≥ 0. Then, the GCV criterion in (16) is expressed as the following function with respect to δ: GCV() = f (δ) =
r (δ) , c(δ)2
(17)
where the functions r (δ) and c(δ) are given by r (δ) =
(n − 3 p − k − 1)s02 + n
k
2 2 j=1 z j δ j
⎞ ⎛ k 1⎝ , c(δ) = 1 − δj ⎠ . 3p + k + 1 − n j=1
Here, z 1 , . . . , z k are given in (10). Let δˆ = (δˆ1 , . . . , δˆk ) be the minimizer of f (δ) in (17), i.e., δˆ = arg min f (δ), δ∈[0,1]k
where [0, 1]k is the kth Cartesian power of the set [0, 1]. Notice that r (δ) and c(δ) are differentiable functions with respect to δ j . Thus, we obtain ∂ 2 c(δ)z 2j δ j − r (δ) . f (δ) = 3 ∂δ j nc(δ)
288
K. Fukui et al.
Hence, noting c(δ) is a finite function, we find a necessary condition for δˆ as ! δˆj =
ˆ ≥ z2) 1 (h(δ) j 2 ˆ < z2) , ˆ (h( δ) h(δ)/z j j
where h(δ) = r (δ)/c(δ) > 0. On the other hand, let H = {δ ∈ [0, 1]k |δ = δ (h), ∀h ∈ R+ }, where δ (h) is the k-dimensional vector for which the jth element is defined as δ j (h)
=
1 (h ≥ z 2j ) , 2 h/z j (h < z 2j )
and R+ is a set of nonnegative real numbers. Then, it follows from H ⊆ [0, 1]k and δˆ ∈ H that ˆ = min f (δ) ≤ min f (δ) = min f (δ (h)), f (δ) δ∈[0,1]k
δ∈H
Hence, we have ˆ δˆ = δ (h)
h∈R+
ˆ ≥ min f (δ (h)). f (δ) h∈R+
hˆ = arg min f (δ (h)) . h∈R+
Using this result, we minimize the following function: f (δ (h)) = f 1 (h) =
r1 (h) , c1 (h)2
where r1 (h) = r (δ (h)) and c1 (h) = c(δ (h)), which can be calculated as ⎡
"2 ⎤ ! k 1⎣ h r1 (h) = I (h < z 2j ) 2 − 1 + 1 z 2j ⎦ , (n − 3 p − k − 1)s02 + n zj j=1 ⎡
"⎤ ! k h 1⎣ c1 (h) = 1 − I (h < z 2j ) 2 − 1 + 1 ⎦ . 3p + k + 1 − n zj j=1 Suppose that h ∈ Ra , where a ∈ {0, 1, . . . , k} and Ra is a range defined by (12). Notice that u j are the jth-order statistics of z 12 , . . . , z k2 . Then, we have f 1 (h) = f 1a (h) =
r1a (h) (h ∈ Ra ), c1a (h)2
A Fast Optimization Method for Additive Model via Partial …
289
where functions r1a (h) and c1a (h) are given by 1 (n − 3 p − k − 1 + a)sa2 + a h 2 , n 1 c1 (h) = c1a (h) = 1 − (3 p + k + 1 − a − a h), n r1 (h) = r1a (h) =
where sa2 is given in (11) and a = kj=a+1 1/u j . By simple calculation, let ga (h) = (n − 3 p − k − 1 + a)(h − sa2 ), we can obtain d 2a f 1a (h) = 2 ga (h). dh n c1a (h)3 Here, we note ga (u a+1 ) = ga+1 (u a+1 ) (a = 0, . . . , k − 1). 2a /{n 2 c1a (h)3 } is positive, lim h→0 g0 (h) < 0, and ga (h) (h ∈ Ra ) tonically increasing function in h ∈ R+ , since n − 3 p − k − 1 > 0 Consequently, the Eq. (13) is obtained by combining δˆj and h ∗ = sa2 (h)/dh|h=h ∗ = 0, and some calculation.
Moreover, is a monoand a ≥ 0. where d f 1a
References 1. Bartolino, V., Colloca, F., Sartor, P., Ardizzone, G.: Modelling recruitment dynamics of hake, merluccius merluccius, in the central mediterranean in relation to key environmental variables. Fish. Res. 92, 277–288 (2008). https://doi.org/10.1016/j.fishres.2008.01.007 2. Craven, P., Wahba, G.: Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31, 377– 403 (1979). https://doi.org/10.1007/BF01404567 3. Hastie, T., Friedman, J., Tibshirani, R.: The elements of statistical learning. Springer, New York (2001) 4. Hastie, T., Tibshirani, R.: Generalized additive models. Chapman & Hall, London (1990) 5. Huang, L.S., Cox, C., Myers, G.J., Davidson, P.W., Cernichiari, E., Shamlaye, C.F., SloaneReeves, J., Clarkson, T.W.: Exploring nonlinear association between prenatal methylmercury exposure from fish consumption and child development: evaluation of the seychelles child development study nine-year data using semiparametric additive models. Environ. Res. 97, 100–108 (2005). https://doi.org/10.1016/j.envres.2004.05.004 6. Mallows, C.L.: Some comments on C p . Technometrics 15, 661–675 (1973). https://doi.org/ 10.2307/1267380 7. Stone, M.: Cross-validatory choice and assessment of statistical predictions. J. Roy. Statist. Soc. Ser. B 36, 111–147 (1974). https://doi.org/10.1111/j.2517-6161.1974.tb00994.x 8. Wand, M.: A comparison of regression spline smoothing procedures. Comput. Statist. 15, 443–462 (2000). https://doi.org/10.1007/s001800000047 9. Wood, S.N.: Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Assoc. 99, 673–686 (2004). https://doi.org/10.1198/ 016214504000000980
290
K. Fukui et al.
10. Wood, S.N.: Fast stable direct fitting and smoothness selection for generalized additive models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70, 495–518 (2008). https://doi.org/10.1111/j.14679868.2007.00646.x 11. Yanagihara, H.: A non-iterative optimization method for smoothness in penalized spline regression. Stat. Comput. 22, 527–544 (2012). https://doi.org/10.1007/s11222-011-9245-0 12. Yanagihara, H.: Explicit solution to the minimization problem of generalized cross-validation criterion for selecting ridge parameters in generalized ridge regression. Hiroshima Math. J. 48, 203–222 (2018). 10.32917/hmj/1533088835
Improvement of the Training Dataset for Supervised Multiclass Classification Yukako Toko
and Mika Sato-Ilic
Abstract The classification of objects based on corresponding classes is an important task in official statistics. In the previous study, the overlapping classifier that assigns classes to an object based on the reliability score was proposed. The proposed reliability score has been defined considering both the uncertainty from data and the uncertainty from the latent classification structure in data and generalized using the idea of the T-norm in statistical metric space. This paper proposes a new procedure for the improvement of the training dataset based on a pattern of reliability scores to get a better classification accuracy. The numerical example shows the proposed procedure gives a better result as compared to the result of our previous study.
1 Introduction This paper presents a procedure for the improvement of a training dataset based on the pattern of reliability scores for overlapping classification. In official statistics, text response fields such as fields for occupation, industry, and the type of household income and expenditure are often found in survey forms. Those text descriptions are usually assigned corresponding classes (or codes) for efficient data processing. Although classification tasks (or coding tasks) have been originally performed manually, studies of automated classification (or automated coding) have made progress with the improvement of computer technology in recent years. For example, Hacking and Willenborg [1] illustrated coding methods for official statistics, including automated coding methods. Gweon et al. [2] introduced some methods for automated occupation coding based on statistical learning. Y. Toko (B) · M. Sato-Ilic National Statistics Center, 19-1 Wakamatsu-Cho, Shinjuku-Ku, Tokyo 162-8668, Japan e-mail: [email protected] M. Sato-Ilic Faculty of Engineering, Information and Systems, University of Tsukuba, Tsukuba, Ibaraki 305-8573, Japan e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_25
291
292
Y. Toko and M. Sato-Ilic
For the coding task of the Family Income and Expenditure Survey in Japan, a supervised multiclass classifier, an autocoding system, has been developed in our previous study [3–6]. Originally, the multiclass classifier that selects the most promising class (or code) based on the probability provided from the training dataset was developed [3]. However, this classifier incorrectly classifies some text descriptions with ambiguous information because of the semantic problem, interpretation problem, and insufficiently detailed input information problem. The main reason for these problems is the unrealistic restriction that one object is classified into a single class (or a code) under the situation that a certain volume of text descriptions include uncertainty. An overlapping classifier was developed [4, 5] to address these issues. The developed overlapping classifier selects multiple classes with a calculation of the reliability scores based on the partition entropy or partition coefficient [7, 8]. The reliability score was defined considering both the uncertainty from data and the uncertainty from the latent classification structure in data to give the classification method a better accuracy of the result. In a subsequent study [6], the reliability score was generalized by using the idea of the T-norm in statistical metric space [9–11] for applying the classifier to various data in practical situations. In addition to the generalization of the reliability score, the reliability score considering the frequency of each object over the classes (or codes) for each object in the training dataset was defined. The previously developed overlapping classifier comprises the training process and classification process [5, 6]. In the training process, the extraction of objects and the creation of a feature frequency table are performed. First, each text description in the training dataset is tokenized by MeCab [12], which is a dictionary-attached morphological Japanese text analyzer. Then, word-level N-grams from the word sequences of a text description are taken as objects; here, unigrams (any word), bigrams (any sequence of two consecutive words), and entire sentence are considered as objects. After extraction of objects, the classifier tabulates all extracted objects based on their given classes (or codes) into a feature frequency table. In the classification process, the classifier performs the extraction of objects, retrieval of candidate classes (or codes), and class assignment based on the values of reliability scores. Although the overlapping classifier gave a better result compared with the previously developed classifier, it still has tasks to be addressed to apply the classifier to unclear data. This paper focuses on the method for the improvement of the training dataset, whereas previous studies focus on the improvement of the classification method. This study proposes a new procedure for improving the training dataset based on the pattern of reliability scores by using the k-means method [13]. First, the k-means method is applied to obtain patterns of reliability scores of text descriptions for “unclear” data which are difficult to classify into a class (or a code). Next, we detect data belonged to clusters whose reliability scores are relatively small. That is, we capture data that does not belong to any classes (or codes). After that, we add the information of the data to the original training dataset which indicates the supervisor of this classifier. This means that we add the “unclear” data information, which is difficult to classify into a class (or a code), to the data mostly
Improvement of the Training Dataset …
293
consisted of clear data information in order to improve the training dataset. Then, the implementation of a classifier with this improved training dataset is performed. The rest of this paper is organized as follows. The overlapping classifier based on the k-means method is proposed in Sect. 2. The experiments and results are described in Sect. 3. Conclusions and suggestions for future work are presented in Sect. 4.
2 Overlapping Classifier Based on k-means Method A new procedure for improvement of a training dataset is proposed to give better accuracy for the classification. The proposed procedure creates the improved training dataset utilizing the k-means method. First, the classifier extracts objects from each text description in a dataset for evaluation. Then, it retrieves all corresponding classes (or codes) from the feature frequency table provided by using the extracted objects [3]. Then we define; n jk ,nj = n jk , j = 1, . . . , J, k = 1, . . . , K , nj k=1 K
p jk =
(1)
where n jk is the number of occurrences of statuses in which an object j assigned to a class k (or a code k) in the training dataset. J is the numberof objects and K is the number of classes (or codes). Then theclassifier arranges p j1 , . . . , p j K in descending order and creates p˜ ˜ j K , such as p˜ j1 ≥ · · · ≥ p˜ j K , j = j1 , . . . , p ˜ ˜ ˜ 1, . . . , J . After that, p˜ j1 , . . . , p˜ j K˜ , K j ≤ K are created. That is, each object has j
different number of classes (or codes). Then the classifier calculates the reliability score for each class (or code) of each object. The originally defined reliability score p¯ jk [5] utilizing the idea of the partition entropy or partition coefficient [7, 8] are defined as: ⎛ p¯ jk = p˜˜ jk ⎝1 +
⎞
˜
Kj
p˜˜ jm log K p˜˜ jm ⎠, j = 1, . . . , J, k = 1, . . . , K˜ j .
(2)
m=1 ˜
p¯ jk = p˜˜ jk
Kj
p˜˜ 2jm , j = 1, . . . , J, k = 1, . . . , K˜ j .
(3)
m=1
These reliability scores were defined considering both probability measure and fuzzy measure. That is, p˜˜ jk shows the uncertainty from training dataset (probability K˜ j ˜ 2 K˜ j ˜ p˜ jm log K p˜˜ jm or m=1 p˜ jm shows the uncertainty from the measure) and 1 + m=1 latent classification structure in data (fuzzy measure). These values of the uncertainty from the latent classification structure can show the classification status of each object; that is how each object classified to the candidate classes (or codes).
294
Y. Toko and M. Sato-Ilic
In a subsequent study [6], the reliability score was generalized utilizing the idea of T-norm in statistical metric space [9–11]. The generalized reliability score p¯¯ jk is written as follows: ⎛ ⎞ K˜ j p˜˜ jm log K p˜˜ jm ⎠, j = 1, . . . , J, k = 1, . . . , K˜ j . (4) p¯¯ jk = T ⎝ p˜˜ jk , 1 + m=1
⎛ p¯¯ jk = T ⎝ p˜˜ jk ,
⎞
˜
Kj
p˜˜ 2jm ⎠, j = 1, . . . , J, k = 1, . . . , K˜ j .
(5)
m=1
An appropriate T-norm from various choices of T can be selected after the reliability score is generalized. Furthermore, to prevent an infrequent object having a significant influence, sigmoid functions g n j were introduced to the generalized reliability score [6]. The redefined reliability score considering the frequency of each object over the classes (or codes) for each object in the training dataset as follows: p¯¯¯ jk = g n j × p¯¯ jk . n For instance, √ j
1+n 2j
(6)
and tanh n j can be taken as sigmoid function g n j . In this
n study, algebraic product is taken as T-norm and √ j
1+n 2j
is taken as a sigmoid function
for the reliability score and the redefined reliability score can be written as: ˜
Kj nj p˜˜ jk p˜˜ 2jm , j = 1, . . . , J, k = 1, . . . , K˜ j . p¯¯¯ jk = 2 1 + nj m=1
(7)
When the number of target text description is L, and each text description includes h l , l = 1, . . . , L objects, corresponding p¯¯¯ jk shown in (7) for l-th text description can be represented as: p¯¯¯ jl k , jl = 1, . . . , h l , k = 1, . . . , K˜ jl , l = 1, . . . , L ,
(8)
which shows a reliability score of j-th object included in l-th text description to a class k (or a code k). The total number of the promising candidate classes (or codes) for l-th text description is hjll=1 K˜ jl . Finally, the classifier selects top V ∈ 1, . . . , hj l=1 K˜ jl classes (or codes) for assignment of l-th text description based l
on the reliability score p¯¯¯ jl k shown in (8). After the first class assignment, incorrectly classified data is retrieved from the evaluated dataset. Here, incorrectly classified data means data that the classifier
Improvement of the Training Dataset …
295
assigns unmatched classes (or codes) at first candidate. Suppose s incorrectly classified data exist, let D = {d1 , . . . , ds } be a set of incorrectly classified data from the evaluated dataset. For the incorrectly classified text descriptions, we calculate reliability scores shown in (8). Then, the k-means method [13] is applied to the set of reliability scores. Here, we used “kmeans” function in “stats” package in R [14]. After the clustering, the values of cluster centers are evaluated to detect clusters whose reliability scores are relatively small. As it is assumed, those detected data do not belong to any decisive classes (or codes); those data are retrieved for the additional training process. Then, the training process is re-performed with the retrieved data to add information of them into the originally trained dataset in order to obtain a more accurate result of classification. Finally, the classifier iterates the classification process with the additionally trained dataset.
3 Numerical Example For the numerical example, the overlapping classifier with the reliability score shown in (8) is applied to the Family Income and Expenditure Survey dataset. The Family Income and Expenditure Survey is a sampling survey related to the household’s income and expenditure conducted by the Statistics Bureau of Japan (SBJ) [15]. The Family Income and Expenditure Survey dataset includes short text descriptions related to the household’s daily income and expenditure (receipt items name and purchase items name in Japanese) and their corresponding classes (or codes). The total number of classes (or codes) is around 550 [16]. Approximately 659 thousand text descriptions were used for training, whereas 54 thousand text descriptions were used for evaluation. The target dataset for evaluation in this experiment is only unclear data that experts cannot easily assign a single class (or a single code). Table 1 shows the classification accuracy of the previously developed overlapping classifier [6]. In this experiment, the classifier assigns the top five candidate classes (or codes) for each text description. The classifier correctly classified approximately 63% ((33, 502/53, 583) × 100(%)) of the dataset for evaluation at first candidate. Then, the k-means method was applied to the reliability scores of incorrectly classified data, that is, 20,081 (53,583–33,502) which shows the difference between the total number of text descriptions and the number of matched text descriptions for first candidate, of the dataset for evaluation is target data for the k-means method. Table 1 Classification accuracy of the overlapping classifier Total Number of correctly classified text descriptions number of 1st 2nd 3rd 4th 5th Total text candidate candidate candidate candidate candidate descriptions Overlapping 53,583 classification
33,502
6,256
2,417
1,285
814
44,274
296
Y. Toko and M. Sato-Ilic
Table 2 shows the values of cluster centers of incorrectly classified data. In this experiment, the k-means method was applied with the number of clusters C = 10. Figure 1 visually shows Table 2. From Fig. 1, data classified into clusters 3 and 6 has small values of cluster centers from first candidate through fifth candidate. In addition, data in clusters 5 and 2 have relatively small values of cluster centers. As we assumed data classified into those four clusters are considered as rarely occurring text descriptions, we retrieved data belonging to those clusters for additional training. Meanwhile, data classified into clusters other than clusters 3, 6, 5, and 2 can be Table 2 Values of cluster centers of incorrectly classified data Values of centers 1st candidate
2nd candidat
3rd candidate
4th candidate
5th candidate
Cluster 1
0.5047
0.3357
0.1596
0.0791
0.0483
Cluster 2
0.3068
0.1616
0.0473
0.0248
0.0167
Cluster 3
0.0105
0.0066
0.0048
0.0038
0.0032
Cluster 4
0.7718
0.3164
0.1405
0.0738
0.0456
Cluster 5
0.2831
0.2243
0.1562
0.0882
0.0529
Cluster 6
0.1420
0.0846
0.0552
0.0346
0.0242
Cluster 7
0.8147
0.0752
0.0360
0.0226
0.0156
Cluster 8
0.5550
0.1190
0.0506
0.0289
0.0195
Cluster 9
0.8325
0.6999
0.1106
0.0565
0.0347
Cluster 10
0.8401
0.7182
0.5536
0.1913
0.0957
Fig. 1 Values of cluster centers of incorrectly classified data
Improvement of the Training Dataset … Table 3 The number of instances in each cluster of incorrectly classified data
297 Number of text descriptions
Cluster 1
646
Cluster 2
2,288
Cluster 3
5,135
Cluster 4
2,152
Cluster 5
904
Cluster 6
2,324
Cluster 7
3,234
Cluster 8
1,273
Cluster 9
1,573
Cluster 10 Total
552 20,081
considered as regularly occurring text descriptions. Therefore, it would be considered that the additional training with those data has less impact on the result. Table 3 shows the number of text descriptions in each cluster. Approximately a quarter of target data was classified in cluster 3 that has the smallest values of cluster centers from first candidate through fifth candidate. In addition, around half of target data was classified in clusters 3, 6, 5, and 2, that is, around half of incorrectly classified data was the retrieved data as relatively small values of reliability scores. In the case when the classifier implements the additional training process using the retrieved data, there is the possibility that the classifier can clearly assign correct class (or code) to half of the incorrectly classified data at a maximum. Then, the additional training process using the retrieved data was implemented to improve the training dataset. After the additional training process, the classifier iterated the class assignment process with the additionally trained dataset (or the improved training dataset). Table 4 shows the results of the classification with the additionally trained dataset (or the improved training dataset) and Fig. 2 visually shows the results shown in Table 4. This shows that the classification with the additionally trained dataset (or the improved training dataset) gave better results compared with the result of the classification with the originally trained dataset. From Fig. 2, it is found that the number of correctly classified data at the first candidate is increased. Especially, in the case that the classification was performed with the additionally trained dataset (or the improved training dataset) which was created using the whole retrieved data (data belonging clusters 3, 6, 5, and 2), the classifier correctly classified 42,740 of the target text descriptions at first candidate, whereas the classifier correctly classified 33,502 of the target text descriptions at first candidate with the originally trained dataset. In the meanwhile, the numbers of correctly classified text descriptions at other than first candidate were decreasing when the classifier performed the additional training process. These results would mean the proposed procedure brings a more accurate result.
Data Augmentation 40,511 41,314 42,740
Cluster 3, 6
Cluster 3, 6, 5
Cluster 3, 6, 5, 2
33,502 38,607
53,583
1st candidate
5,035
5,527
5,736
6,138
6,256
2nd candidate
1,591
1,755
1,945
2,261
2,417
3rd candidate
Number of correctly classified instances
Cluster 3
Overlapping classification
Number of total instances
Table 4 Classification accuracy with improved training dataset
766
873
998
1,140
1,285
4th candidate
435
526
586
699
814
5th candidate
50,567
49,995
49,776
48,845
44,274
Total
298 Y. Toko and M. Sato-Ilic
Improvement of the Training Dataset …
299
Fig. 2 Classification accuracy with improved training dataset
Figure 3 shows the values of cluster centers of correctly classified data. We also performed k-means clustering to correctly classified data with the number of clusters C = 10. Comparing Fig. 1 with Fig. 3, it seems that the correctly classified data tends to be more clearly assigned a class (or a code) as more clusters in Fig. 3 have a certain
Fig. 3 Values of cluster centers of correctly classified data
300
Y. Toko and M. Sato-Ilic
Table 5 Comparison of the number of text descriptions in each cluster Incorrectly classified data
Correctly classified data
Number of text descriptions
Number of text descriptions
Composition ratio
Composition ratio
Cluster 1
646
3.22%
458
1.37%
Cluster 2
2,288
11.39%
2,285
6.82%
Cluster 3
5,135
25.57%
312
0.93%
Cluster 4
2,152
10.72%
4,114
12.28%
Cluster 5
904
4.50%
485
1.45%
Cluster 6
2,324
11.57%
1,119
3.34%
Cluster 7
3,234
16.10%
19,354
57.77%
Cluster 8
1,273
6.34%
2,555
7.63%
Cluster 9
1,573
7.83%
2,389
7.13%
552
2.75%
431
1.29%
20,081
100.00%
33,502
100.00%
Cluster 10 Total
gap between the values of the cluster center of first candidate and other candidates. It can be assumed that a large portion of correctly classified data is clear data whereas a certain volume of the incorrectly classified data includes more uncertainty. To ensure the assumption, we performed the following experiment. We classified the correctly classified text description into clusters created from incorrectly classified data shown in Table 2: we classified each correctly classified text description into a cluster having the nearest Euclidean distance of values of cluster centers shown in Table 2 and the calculated reliability scores of each correctly classified text descriptions. Table 5 shows the comparison of the number of text descriptions in each cluster between the incorrectly classified data and the correctly classified data. 57.77% of the correctly classified data assigned to cluster 7 has the biggest gap between the values of the cluster center of first candidate and the other candidates. It would mean, 57.77% of the correctly classified text descriptions are easily classified into a single class as data classified into cluster 7 can be clearly assigned a single class (or code). Meanwhile, only 16.1% of the incorrectly classified data is classified into cluster 7. In addition, only 0.93% of correctly classified data assigned to cluster 3 that has the smallest values of cluster centers from first candidate through fifth candidate. It would mean most of the correctly classified text descriptions are not unclear data. Whereas, 25.57% of incorrectly classified data is assigned to cluster 3. From this experiment, it can be seen that the clearness of information is different between correctly classified data and incorrectly classified data. The numerical examples showed the proposed procedure can efficiently improve the training dataset for unclear data. By utilizing the k-means method, the proposed procedure extracted data that do not belong to any decisive class (or code). Then, we created an improved training dataset with the extracted data. The classification
Improvement of the Training Dataset …
301
with the additionally trained dataset (or the improved training dataset) made a better result of classification accuracy.
4 Conclusion This paper proposed a new procedure for the improvement of the training dataset to get a better result of the classification. The k-means method was applied to obtain the pattern of reliability scores. As it is assumed that data belong to clusters whose reliability score are relatively small do not belong to any decisive classes (or codes), an additional training process using data classified into clusters whose reliability score are relatively small is performed. Then the class assignment process was iterated using an additionally trained dataset. The numerical example showed a better performance of the proposed procedure. In future works, a more detailed analysis for data that belong to clusters whose reliability score is not small is required.
References 1. Hacking, W., Willenborg, L.: Method series theme: Coding; interpreting short descriptions using a classification. In: Statistics Methods. Statistics Netherlands (2012). https://www.cbs. nl/en-gb/our-services/methods/statistical-methods/throughput/throughput/coding. Accessed 8 Jan 2020 2. Gweon, H., Schonlau, M., Kaczmirek, L., Blohm, M., Steiner, S.: Three methods for occupation coding based on statistical learning. J. Off. Stat. 33(1), 101–122 (2017) 3. Toko, Y., Wada, K., Kawano, M.: A supervised multiclass classifier for an autocoding system. J. Rom. Stat. Rev. 4, 29–39 (2017) 4. Toko, Y., Wada, K., Iijima, S., Sato-Ilic, M.: Supervised multiclass classifier for autocoding based on partition coefficient. In: Czarnowski, I., Howlett, R.J., Jain, L.C., Vlacic, L. (eds.) Intelligent Decision Technologies 2018. Smart Innovation, Systems and Technologies, vol. 97, pp. 54–64. Springer, Switzerland (2018) 5. Toko, Y., Iijima, S., Sato-Ilic, M.: Overlapping classification for autocoding system. J. Rom. Stat. Rev. 4, 58–73 (2018) 6. Toko, Y., Iijima, S., Sato-Ilic, M.: Generalization for Improvement of the Reliability Score for Autocoding. J. Rom. Stat. Rev. 3, 47–59 (2019) 7. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981) 8. Bezdek, J.C., Keller J., Krisnapuram, R., Pal, N.R.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer Academic Publishers (1999) 9. Menger, K.: Statistical metrics. Proc. Natl. Acad. Sci. U.S.A. 28, 535–537 (1942) 10. Mizumoto, M.: Pictorical representation of fuzzy connectives, Part I: Cases of T-norms, tConorms and averaging operators. Fuzzy Sets Syst. 31, 217–242 (1989) 11. Schweizer, S., Sklar, A.: Probabilistic Metric Spaces. Dover Publications, New York (2005) 12. Kudo, T., Yamamoto, K., and Matsumoto, Y.: Applying conditional random fields to Japanese morphological analysis. In: The 2004 Conference on Empirical Methods in Natural Language Processing on proceedings, pp. 230–237. Barcelona, Spain (2004) 13. Hartigan, J.A., Wong M.A.: Algorithm AS 136: A K-Means clustering algorithm. J. R. Stat. Soc. Ser. C Appl. Stat. 28(1), 100–108 (1979)
302
Y. Toko and M. Sato-Ilic
14. R Core Team: R: A language and environment for statistical computing, R Foundation for Statistical Computing. Vienna, Austria (2018). https://www.R-project.org/. Accessed 8 Jan 2019 15. Statistics Bureau of Japan: Outline of the Family Income and Expenditure Survey. Available at: https://www.stat.go.jp/english/data/kakei/1560.html. Accessed 14 Feb 2020 16. Statistics Bureau of Japan: Income and Expenditure Classification Tables (revised in 2020). Available at: https://www.stat.go.jp/english/data/kakei/ct2020.html. Accessed 14 Feb 2020
A Constrained Cluster Analysis with Homogeneity of External Criterion Masao Takahashi , Tomoo Asakawa, and Mika Sato-Ilic
Abstract Constrained cluster analysis is a semi-supervised approach of clustering where some additional information about the clusters is incorporated as constraints. For example, sometimes, we need to consider the constraint of homogeneity among all obtained clusters. This paper presents an algorithm for constrained cluster analysis with homogeneity of clusters and shows a practical application of the algorithm in formulating survey blocks in official statistics such as the Economic Census, which reveals the effectiveness of the algorithm. In this application, travel distance is utilized considering the property of homogeneity of this clustering.
1 Introduction A number of methods on constrained cluster analysis, which is a semi-supervised approach of clustering, have been proposed so far. For example, there is a method that certain pairs of points are included or not included in the same cluster [1] and a method to avoid empty clusters [2]. There exist other kinds of constrained clustering methods where each cluster has at least a minimum number of points in it [3], and each cluster has an equal number of points [4], which is called balanced clustering. A method that aims to minimize the total costs assigned to clusters has also been proposed [5]. M. Takahashi (B) National Statistics Center, Shinjuku, Tokyo 162-8668, Japan e-mail: [email protected] T. Asakawa Statistics Bureau, Ministry of Internal Affairs and Communications, Shinjuku, Tokyo 162-8668, Japan e-mail: [email protected] M. Sato-Ilic Faculty of Engineering, Information and Systems, University of Tsukuba, Tsukuba, Ibaraki 305-8573, Japan e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_26
303
304
M. Takahashi et al.
Sometimes, we need to achieve homogeneity of clusters based on an external criterion. For example, when each cluster is assigned to a person to do some kind of task of which burden depends on some attributes of the objects in the cluster such as the total number of units in the objects that the person should process, and each person expects that the burden for the task is as equal as possible among clusters. In this paper, we present a method for constrained cluster analysis in order to pursue homogeneity of clusters in the above kind of situation [6]. This method can be applied, for example, to determine survey areas for enumerators of a statistical survey by combining building block areas that have a different number of units to be surveyed, and securing homogeneity of survey areas in terms of the burden, e.g., the number of units to be surveyed, on enumerators is desired. The remainder of the paper is structured as follows. We begin by presenting algorithms for the above-mentioned constrained cluster analysis in Sect. 2. In Sect. 3, the method is practically applied to formulate the survey blocks for the 2019 Economic Census, followed by the results of the application in Sect. 4. The paper concludes in Sect. 5, where some possible future work is also presented.
2 Algorithms We propose a two-stage clustering method to achieve homogeneity of clusters. The first stage of this clustering is based on the ordinary k-means clustering algorithm. As a result of the clustering, each object belongs to its nearest cluster. For the second stage, a clustering method [6] is applied to achieve the homogeneity of clusters. The algorithms for this clustering are shown in the following subsections.
2.1 The First-Stage Cluster Analysis: An Algorithm Applying k-Means Method In the first stage of the cluster analysis, the following algorithm is performed using the ordinary k-means method. When applying the k-means method, an initial situation of the clusters can be decided in many ways. One example is to begin randomly deciding the center of each cluster. If there exists some natural initial situation, it could be adopted. In both cases, the number of clusters should be given at the first stage of the algorithms for this study. (1) Determine the number of clusters. (2) Set up initial clusters. If there exists some natural initial situation concerning clusters, it could be adopted and go to the step (6). If not, go to the next step. (3) Randomly decide the center of each cluster to which nearby objects are assigned. (4) For each object, calculate the distances between the centers of all clusters.
A Constrained Cluster Analysis with Homogeneity …
305
(5) Using the distances calculated in the step (4), the cluster to which each object belongs is decided to the nearest cluster. (6) Calculate the position of the center for each cluster. (7) For each object included in each cluster, calculate the distance between the object and the center of its cluster. For each object, calculate the distances to the center for all the other clusters. From the object of which distance from the center of its cluster is the longest to the shortest among the objects in the cluster, check whether the nearest cluster from the object differs from the cluster to which it currently belongs. If so, the cluster to which the object belongs is changed to the nearest cluster. (8) Perform the above (7) for all the objects of all the clusters one after the other. Note that the maximum number of objects that change clusters to which they belong is only one for a cluster at one time. (9) Repeat the steps (6) to (8) above. When each object belongs to its nearest cluster, the process is terminated.
2.2 The Second-Stage Cluster Analysis: An Algorithm to Add Constraints In the second stage of this clustering, the constraints that aim to achieve the homogeneity of clusters, such as the number of units included in each cluster, are added. It is described in the following. (1) Calculate the position of the center for each cluster. (2) Pick up a cluster one by one from the cluster that has the largest (but greater than some threshold) total number of the units concerned. The threshold can be set to a value, which is certain degrees, say 10%, greater than the average value among all the clusters. (3) From the above-selected cluster, pick up the object that has the longest distance from the center of its cluster. If the distance is longer than the distance from the current object to the nearest object that belongs to the other cluster, the cluster to which the current object belongs is changed to the “other” cluster. (4) In the step (3), when the objects in a cluster are concentrated so that the distance from each object to the center of the cluster to which it belongs is shorter than any other distance from each object to the objects in the other clusters, the object, which has the longest distance from the center of its cluster, changes clusters to the “other” cluster which has an object whose distance to the object concerned is the shortest. (5) During the steps from (3) to (4), the maximum number of the objects which change clusters is one. (6) Repeat the steps (1) to (5) until the maximum total value of clusters does not exceed a certain threshold, say 10% above the average value of all the clusters. The repetition should also stop when a new object, which changes its clusters, does not come into existence anymore.
306
M. Takahashi et al.
3 Practical Application: Formulating Survey Blocks for the 2019 Economic Census In this section, a practical application of the above clustering method is shown, where the method is applied to the formulation of the survey blocks for the 2019 Economic Census, in which all the establishments located in Japan are investigated. Since 2009, the Economic Census has been conducted every two or three years— in 2009, 2012, 2014, and 2016. In these Censuses, enumeration districts were set up in order to conduct the survey correctly. An enumeration district is a unit area for census-taking and contains around 25–30 establishments on an average. The total is about 250 thousand, and enumeration district maps, which contain several enumeration districts on average, have been prepared for all the enumeration districts. In the 2019 Economic Census, however, the survey method has been reorganized where an enumerator visits his/her “survey block” assigned for investigation, which has around 400–500 establishments on an average. In order to formulate the survey blocks, it seemed efficient to utilize the enumeration districts set up in the 2016 Economic Census. So, it was decided to formulate the survey blocks for the 2019 Economic Census by combining adjacent enumeration districts for the 2016 Economic Census.
3.1 Information Available In this study, the information available in formulating the survey blocks for the 2019 Economic Census is as follows. – The number of establishments in each enumeration district – The area of each enumeration district – The coordinates of the central point of each enumeration district (latitude/longitude) Note that the above information can be derived from the results of the 2016 Economic Census.
3.2 The Basic Concepts in Formulating the Survey Blocks The basic concepts in formulating the survey blocks are as follows. – Each survey block should be formulated by combining adjacent enumeration districts as far as possible. – Each survey block should be as homogeneous as possible in terms of the burden of the enumerator assigned to the block.
A Constrained Cluster Analysis with Homogeneity …
307
Based on the above basic concepts, the proposed clustering method is applied to the formulation of the survey blocks for the 2019 Economic Census. First, we need to determine how to evaluate the burden of enumerators. Two major factors concern the burden of enumerators of the Economic Census: the number of establishments to be investigated located in the survey block and the distance that the enumerator travels during his/her investigation for the block. The above two factors constitute the constraints for the cluster analysis concerned in terms of the homogeneity of the enumerators’ burden; however, it is convenient to integrate the two factors into a single indicator so that the burden of enumerators is intelligibly evaluated. The method to integrate the two factors is described in the next subsection. For the method of formulating the survey blocks using the integrated indicator for the burden of enumerators, we applied the clustering method shown in Sect. 2 using the constraints that each survey block places nearly the same burden on the enumerator assigned. We first apply the k-means clustering method by using the distance from an enumeration district to the center of nearby clusters. Then, as the second stage, the clustering method to achieve the homogeneity of the burden on the enumerators is applied. As a result, an optimal formulation of the survey blocks for the 2019 Economic Census can be performed.
3.3 The Constraint: An Indicator for the Burden of Enumerators As mentioned above, it is convenient to integrate the two factors indicating the burden of enumerators: the number of establishments to be investigated located in the survey block and the distance that the enumerator travels during his/her investigation for the block. For this purpose, we converted the travel distance of enumerators, which can be estimated by the area and the number of establishments in each enumeration district, into the equivalent number of establishments to be investigated. This equivalent number of establishments, together with the number of establishments actually grasped by the 2016 Economic Census, is called the “Deemed Number of Establishments (DNEs),” which can be an indicator for evaluating the burden of the enumerator assigned to the survey block consisting of enumeration districts. The procedure for calculating the DNEs is as follows. Estimation of the Travel Distance of Enumerators. We estimated the distance that an enumerator travels during the investigation for an enumeration district in the following manner. (1) It is assumed that the establishments are evenly distributed within each enumeration district, and each district can be divided into the same number of regular hexagons as the number of establishments contained therein. Assuming that
308
M. Takahashi et al.
there is an establishment at the center of each hexagon, the average distance between establishments is calculated for each enumeration district. (2) The average distance between establishments calculated in (1) above, multiplied by the number of establishments in the enumeration district, is the estimated value of the travel distance of enumerators in the enumeration district. The above procedure can be formalized as follows [7]. 14 4 √ S · n, L= 3
(1)
where L is the estimated travel distance of the enumerator for the enumeration district concerned, S is the area of the enumeration district, and n is the number of establishments (or the number of regular hexagons) in the enumeration district. Converting the Travel Distance into the Equivalent Number of Establishments. We converted the travel distance of enumerators for an enumeration district concerned into the equivalent number of establishments in terms of the burden on the enumerator. The procedure is described as follows. (1) Estimating the travel distance of enumerators for each enumeration district using the calculation formula (1) and averaging the travel distances for all the enumeration districts over Japan, which yielded about 3.2 km on an average for the 2019 Economic Census. (2) For convenience sake, we put 1.6 km as a unit distance, which is a half value of the above-calculated distance. (3) In converting the travel distance of 1.6 km into the equivalent number of establishments, we used the following information derived from the survey plan for the 2019 Economic Census – The average number of establishments to be investigated by an enumerator is 500. – An enumerator is supposed to work 3–4 hours a day and 3–4 days a week. – The actual survey period for an enumerator is about 6 weeks, which we call “one term.” (1) Using the above information, the time that an enumerator takes for investigating one establishment can be estimated as 10 min, which is calculated as follows. 3.5(h/day) × 60(min./h) × 3.5(days/week) ×6(weeks/term)/500(estab./term) 10(min./estab.) (2) Assuming that an enumerator takes 30 min for moving 1.6 km, this distance can be converted into 3 establishments as follows.
A Constrained Cluster Analysis with Homogeneity …
309
1.6(km) ↔ 30(min.)/10(min./estab.) = 3(estab.) (3) The DNEs for an enumeration district can be calculated by adding the following equivalent number of establishments to the actual number of establishments grasped at the 2016 Economic Census,
L − 3.2 3 · ceil , 1.6
(2)
where ceil denotes a function of rounding up digits after the decimal point to form an integer, and L is the estimated travel distance of the enumerator for the enumeration district concerned defined above.
3.4 The First-Stage Cluster Analysis In the first stage of formulating survey blocks, the k-means cluster analysis method is applied to enumeration districts of the Economic Census without considering the number of establishments in each survey block. The algorithm is described below. Number of Clusters. For each municipality, the number of clusters, which means the number of survey blocks and equals to the total number of enumerators, is decided by dividing the total number of DNEs in each municipality by the average number of establishments (500) assigned to the enumerators (round-up after the decimal point). Initial Set of Clusters. The initial set of clusters is set up for each municipality as follows. First, we sort the data of enumeration districts in each municipality by using the following key: the enumeration district map number and the enumeration district number. Then we accumulate the DNEs in each enumeration district. The initial set of clusters is set up by aggregating sorted enumeration districts, so each cluster has around 500 DNEs (The last cluster usually has less than 500 DNEs, but it will be adjusted by the next k-means cluster analysis.). k-means Cluster Analysis. The cluster analysis described in Sect. 2.1 is performed. The distance between the two data points is relatively short, so it can be evaluated by approximation formula using the latitude and longitude of the points. The distance of the two points l can be calculated, for example, as follows [7]. l = 6370 · arccos(sin φ1 sin φ2 + cos φ1 cos φ2 cos(λ1 − λ2 )),
(3)
where the sets of latitude and longitude of the data points are denoted as (φ1 , λ1 ) and (φ2 , λ2 ).
310
M. Takahashi et al.
3.5 The Second-Stage Cluster Analysis In the second stage of formulating the survey blocks, the clustering method shown in Sect. 2.2 is applied to equalize the DNEs as far as possible in each cluster (survey block) in the municipality concerned.
4 Results: A Numerical Example As an example, we applied the above procedure described in Sect. 3 to a municipality. The name of the municipality is Konosu City in Saitama Prefecture, which is located to the north of Tokyo. The reason we selected this city is that the shape of the city is rather complicated, and it may provide us with some good examples. Based on the procedure proposed in Sect. 3.3, the total DNEs of Konosu City is calculated as 4,844. So the number of survey blocks (clusters) in the city is 10 (= ceil(4844/500)). By using the information of enumeration district maps and their corresponding enumeration districts, the initial set of the survey blocks can be set up as in Fig. 1, and the DNEs in each survey block are shown in Table 1. Figure 1 and Table 1 show that the DNEs in each survey block are equalized to a certain degree except for block number 10, which is the last block of the city. Figure 1 also shows that some survey blocks are disjoint due to the non-adjacency of Fig. 1 The initial set of the survey blocks in Konosu City
Table 1 The DNEs by survey block at the initial state BlkNo.
1
2
3
4
5
6
7
8
9
10
Avr.
Std.
DNEs
433
568
529
483
505
484
508
571
491
272
484.4
80.78
A Constrained Cluster Analysis with Homogeneity …
311
Fig. 2 The set of the survey blocks after the k-means cluster analysis
Table 2 The DNEs by survey block after the k-means cluster analysis BlkNo.
1
2
3
4
5
6
7
8
9
10
Avr.
Std.
DNEs
366
232
1057
414
545
489
538
449
332
422
484.4
211.1
enumeration districts when being arranged in numerical order of their enumeration district numbers. After applying the ordinary k-means cluster analysis, the set of the survey blocks is formulated as in Fig. 2, and the DNEs in each survey block are shown in Table 2. As shown in Fig. 2, every enumeration district in each survey block is adjacent to each other, but some survey blocks have a very large number of DNEs, resulting in large inhomogeneity. This inhomogeneity can be rectified by the next step cluster analysis. Lastly, by applying the second-stage cluster analysis, the set of the survey blocks is formulated as in Fig. 3, and the DNEs in each survey block are shown in Table 3. Figure 3 shows the adjacency of each enumeration district in each survey block. In addition, DNEs of each survey block are more balanced in Table 3 compared with Tables 1 and 2. The quality of the results can be confirmed by the figure of the standard deviation of the DNEs in each table: they are 80.78, 211.1, and 33.78, respectively.
5 Conclusions In this paper, we present a constrained cluster analysis that achieves the homogeneity of clusters. Then the algorithm is applied to a practical case of formulating
312
M. Takahashi et al.
Fig. 3 The set of the survey blocks after the second-stage clustering
Table 3 The DNEs by survey block after the second-stage clustering BlkNo.
1
2
3
4
5
6
7
8
9
10
Avr.
Std.
DNEs
521
479
531
459
524
492
469
518
404
447
484.4
33.78
survey blocks for the 2019 Economic Census. We could obtain a solution that could provide enumerators of the Census with an optimal set of survey blocks that are as homogeneous as possible in terms of the burden on the enumerators. Further refinement on the homogeneity concerning the burden on the enumerators may be possible if we could use more information on the enumeration districts, which are the building blocks in formulating survey blocks, such as the information of geographical proximity of enumeration districts. We leave it for future study.
References 1. Lu, Z., Leen, T.K.: Pairwise Constraints as Priors in Probabilistic Clustering. In: Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series. CRC Press, Taylor and Francis Group, Boca Raton (2009) 2. Demiriz, A., Bennett, K.P., Bradley, P.S.: Using Assignment Constraints to Avoid Empty Clusters in k-Means Clustering. In: Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series. CRC Press, Taylor and Francis Group, Boca Raton (2009) 3. Bradley, P.S., Bennett, K.P., Demiriz, A.: Constrained k-Means Clustering. Tech. rep., MSRTR-2000-65, Microsoft Research (2000) 4. Malinen, M.I., Fränti, P.: Balanced K-Means for Clustering. In: Fränti, P., Brown, G., Loog, M., Escolano F., Pelillo, M. (eds.) Structural, Syntactic, and Statistical Pattern Recognition. S + SSPR 2014. Lecture Notes in Computer Science, vol. 8621. Springer, Berlin, Heidelberg (2014)
A Constrained Cluster Analysis with Homogeneity …
313
5. Lefkovitch, L.P.: Conditional Clustering. Biometrics 36, 43–58 (1980) 6. Takahashi, M., Asakawa, T., Sato-Ilic, M.: A Proposal for a Method of Forming Survey Blocks in the Economic Census for Business Frame: Development of a New Algorithm of Constrained Cluster Analysis Using the Information of Spatial Representative (in Japanese). GIS Association of Japan, D72 (2018) 7. Miura, H.: Three Ways to Calculate Distances Using Latitude and Longitude (in Japanese). Operations Research, vol. 60, pp. 701–705. The Operations Research Society of Japan, Tokyo (2015)
Trust-Region Strategy with Cauchy Point for Nonnegative Tensor Factorization with Beta-Divergence Rafał Zdunek and Krzysztof Fonał
Abstract Nonnegative tensor factorization is a well-known unsupervised learning method for multi-linear feature extraction from a nonnegatively constrained multiway array. Many computational strategies have been proposed for updating nonnegative factor matrices in this factorization model but they are mostly restricted to minimization of the objective function expressed by the Euclidean distance. Minimization of other functions, such as the beta-divergence, is more challenging and usually leads to higher complexity. In this study, the trust-region (TR) algorithm is used for minimization of the beta-divergence. We noticed that the Cauchy point strategy in the TR algorithm can be simplified for this function, which is profitable for updating the factors in the discussed model. The experiments show high efficiency of the proposed approach.
1 Introduction Nonnegative tensor factorization (NTF) can be regarded as the polyadic or CANDECOMP/PARAFAC [3, 11] decomposition with the nonnegativity constraints imposed onto all the factor matrices. It was proposed by Shashua and Hazan [19] as a multilinear extension to nonnegative matrix factorization (NMF) [15], where the factors are updated with multiplicative algorithms. Nowadays, it is a popular model with many important applications, including machine learning, pattern recognition, computer vision, audio and image processing [9, 10, 12–14, 16, 20–22]. Many numerical algorithms for updating the factors in NTF are based on the alternating optimization scheme with an objective function given by the Euclidean distance. Quadratic functions are widely used in tensor decomposition models, which are motivated by both algorithmic and application reasons. However, there are some problems for which residual errors between data and an approximating model are not R. Zdunek (B) · K. Fonał Faculty of Electronics, Wroclaw University of Science and Technology, Wybrzeze Wyspianskiego 27, 50-370 Wroclaw, Poland e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_27
315
316
R. Zdunek and K. Fonał
modeled with a Gaussian distribution or the data are generated from a non-Gaussian distribution, and for such cases, other objective functions seem to be more efficient. Many objective functions used for NMF or NTF belong to two basic families: α- and β-divergences [4]. For example, the Euclidean distance, generalized Kullback– Leibler (KL) divergence or Itakura–Saito (IS) distance are special cases of the βdivergence. Moreover, both families have been unified to α-β-divergence and applied for robust NMF [5]. The generalized objective functions have also been used for solving NTF-based problems. Cichocki et al. [4, 7] proposed a family of multiplicative algorithms for NTF, referred to as α- and β-NTF, but due to multiplicative update rules their convergence is slow. Moreover, they are not very efficient for solving large-scale problems because they compute a full model in each iteration and for each factor. To tackle the convergence problem, Cichocki and Phan [6] proposed another computational strategy for NTF that is based on the hierarchical alternating least square scheme [6]. This approach, despite its high performance, uses a block coordinate descent method that is difficult for parallelization. These functions can also be minimized with the second-order algorithms (e.g., the quasi-Newton, GPCG) [4] but involving a direct computation of the Hessian matrix, which is still an inefficient approach for solving large-scale problems. Moreover, it is also difficult to control positive-definiteness of the Hessian. The above-mentioned problems can be relaxed if the Trust-Region (TR) optimization framework with the Cauchy point estimate [1, 17, 18] is used. Motivated by the success of the TR algorithms in solving NMF problems [23, 24], in this study, we extend the concept proposed in [24] to NTF with the β-divergence [8] that is closely related to the density power divergence proposed by Basu [2]. The advantage of this approach is twofold. First, positive-definiteness of the Hessian can be easily controlled with a radius of the trust-region. Second, the convergence can be improved by using partial information on curvature of an objective function. In TR algorithms, an objective function is approximated by the second-order Taylor series approximation model (similarly as in the Newton method) but the Cauchy point strategy assumes that the Hessian is involved only in computing the step length along the normalized gradient descent direction. It is thus a quasi-Newton approach that allows us to avoid heavy computations related to inversion of the Hessian. The Hessian for the β-divergence has a block-diagonal structure, which is profitable for reducing a computational complexity of Cauchy point estimates. The numerical experiments, performed on various synthetically generated multi-way arrays, demonstrated the proposed TR-based algorithm for NTF is computationally much more efficient than the standard α- and β-NTF methods. Moreover, the proposed computational scheme can be easily extended to penalized objective functions with differentiable penalty terms. The paper is organized as follows: Sect. 2 introduces to the NTF model. Section 3 describes the proposed algorithm. The experiments are presented in Sect. 4. Finally, the discussion and conclusions are given in Sect. 5.
Trust-Region Strategy with Cauchy Point for Nonnegative …
317
2 Nonnegative Tensor Factorization Let Y = [yi1 ,...,i N ] ∈ R+I1 ×...×I N be the N -way array of observed nonnegative data, which will be also referred to as a tensor. NTF of Y can be expressed by the model: Y=
J
(N ) u(1) j ◦ . . . ◦ uj ,
(1)
j=1 In where u(n) j ∈ R+ is the j-th feature vector across the n-th mode of Y, and the symbol ◦ denotes the outer product. The number J of feature vectors across each mode is equal in NTF, and it is given by the parameter J . Collecting the feature vectors {u(n) j } (n) (n) In ×J (n) to the factor matrix U = u1 , . . . , u J ∈ R+ , model (1) can be equivalently presented in the form:
Y = I ×1 U (1) ×2 . . . × N U (N ) ,
(2)
where I ∈ R+J ×...×J is an N -th order identity tensor, and ×n is the tensor-matrix product across the n-th mode for n = 1, . . . , N . In ×
Ip
Let Y (n) ∈ R+ p=n be a matrix obtained from Y by its matricization or unfolding along the n-th mode. Applying the unfolding to model (2), we get the following system of linear equations: Y (n) = U (n) U −n T ,
(3)
where U −n = U (N ) . . . U (1) ∈ R p=n I p ×J and the symbol stands for the Khatri–Rao product. Note that the system in (3) is considerably over-determined because p=n I p >> J but it can be readily transformed to a multiple right-hand side system with a square system matrix: m=n , Y (n) U −n = U (n) U −n T U −n = U (n) U (m)T U (m)
(4)
m=n U (1)T U (1) )···(U (N )T U (N ) ) where U (m)T U (m) =( ∈ R+J ×J , and the symbol U (n)T U (n) means the Hadamard product. m=n ∈ R+J ×J . The system Let Z (n) = Y (n) U −n ∈ R+In ×J and B (n) = U (m)T U (m) (4) can be expressed in the form: z in = uin B (n) , for i n = 1, . . . , In ,
(5)
where z in ∈ R1×J and uin ∈ R1×J are the i n -th row vectors of Z (n) and U (n) , respec+ + (n) tively. Note that B is a symmetric matrix.
318
R. Zdunek and K. Fonał
Assuming ∀n : J 0 determines the maximum n ∗(k) step length along d in to satisfy the TR bound. The solution to (10) can be presented in the closed-form: ∗(k) dˆ in = −
Δi(k) n ||g i(k) ||2 t
g i(k) .
(11)
t
The step length τi∗(k) is derived from (8) by computing n τˆi(k) n
=
||g i(k) ||32 n
Δi(k) g i(k) H i(k) g i(k)T n n n n
∗(k) ∂ m (τ dˆ in ) ∂τ k
.
0. Thus, (12)
∗(k) ∗(k) If the constraint || dˆ in ||2 = Δi(k) is satisfied, then the point dˆ in is located on the n ∗(k) boundary of the TR. When g (k) H (k) g (k)T ≤ 0, the point dˆ is inside the TR, which in
in
leads to τi∗(k) = 1. Finally, we have n τi∗(k) n
in
in
⎧ ⎨1
if g i(k) H i(k) g i(k)T ≤ 0, n n n = ⎩ min 1, τˆi(k) otherwise n
(13)
(n) For the β-divergence in (7), the gradient g i = ∇uin Ψ z in , ui(k) ∈ R1×J can B n n be expressed by the formula: q i − z in (q i )β−1 B (n) .
(14)
H in = B (n) diag(hin )B (n)T ∈ R J ×J ,
(15)
gi =
n
n
n
The Hessian has the structure:
320
R. Zdunek and K. Fonał
where hin = (βq i − (β − 1)z in ) q iβ−2 ∈ R1×J . n
(16)
n
g i(k)T in (12) and (13) can be reformulated as The term g i(k) H i(k) n n
n
(k) (k) (k)T (k) (n) (n)T (k)T (k) (n) 2 eJ , g = g B diag(h )B g = h (g B ) g i(k) H i(k) in in n in in in in n (n,k) ˇ = H (G (n,k) B (n) )2 e J (17) in
where
(k) ˇ (n,k) = h(k) ∈ R In ×J , H 1 ; . . . ; h In
(k) ∈ R In ×J , G (n,k) = g (k) ; . . . ; g 1 I n
e J = [1, . . . , 1]T ∈ R J , and () p means an element-wise rise to power p. Considering the simplified form in (17), the step length (12) can be rewritten to τi∗(k) n
=
Δi(k) n
(n,k) 3 (G ) e J in ˇ (n,k) (G (n,k) B (n) )2 e J H
,
(18)
in
and the Cauchy point can be expressed as follows: D
∗(k)
=−
τ ∗(k) Δ(k) (G (n,k) )2 e J
eTJ G (n,k) ,
(19)
(k) T In where Δ(k) = [Δ(k) 1 , . . . , Δ In ] ∈ R . Note that the Hessian H in in the TR method does not need to be inverted (as in the Newton method) and due to the simplification in (17), it does not have to be computed ˇ (n) ∈ R In ×J should explicitly. For updating In J variables in U (n) , only the matrix H be estimated, which reduces a computational complexity. The pseudo-code of the TR algorithm with the Cauchy point for updating the factor matrices in NTF is listed in Algorithm 1. The operation [ξ]+ = max(0, ξ) means the projection of ξ onto the nonnegative orthant R+ .
4 Experiments The experiments are carried out using synthetically generated datasets. Each factor matrix U (n) = [u in , j ] ∈ R+In ×J contains nonnegative numbers obtained according to the rule: u in , j = max{0, uˆ in , j }, where ∀n, i n , j : uˆ in , j ∼ N (0, 1) (normal distribution). The algorithms are validated with the following benchmarks: B1: N = 3, I1 = I2 = I3 = 200, J = 10 (3D tensor), B2: N = 4, I1 = I2 = I3 = I4 = 50, J = 5 (4D tensor). Note that the factor matrices are sparse in about 50 % but the resulting tensor Y is nearly dense in each case.
Trust-Region Strategy with Cauchy Point for Nonnegative …
321
The proposed TR-NTF algorithm is compared with α-NTF (Algorithm 7.2 in [4]) and β-NTF (Algorithm 7.3 in [4]). The initial guesses for {U (n) } were generated from U(0, 1) (uniform distribution). The Monte Carlo (MC) analysis was carried out for each algorithm. Each algorithm was run 10 times for new data and initializer in each MC run. To evaluate robustness of the algorithms against noisy perturbations, the noise-free data were perturbed with an additive Gaussian noise N (0, σ 2 ) with variance σ 2 adapted to have the signal-to-noise ratio (SNR) equal to 20 dB. In TRNTF, we set default parameters: k T R = 20, Δ = 10−4 , Δ¯ = 106 . The algorithms were terminated if the number of alternating steps exceeded 200 or a relative residual error drops below the threshold 10−8 but only for noise-free data. For noisy data, each algorithm runs 200 iterations to avoid an unwanted termination at early stagnation points. We also compare the algorithms for various settings of parameters in the α- and β-divergences. For α-NTF, we set α = 1 (generalized KL divergence) and α = 2 (Pearson χ2 distance). For β-NTF and TR-NTF, we set β = −1 (IS distance),
Algorithm 1: TR-NTF I1 ×...×I N Input : Y ∈ R+ - input tensor, J - rank, k T R - number of TR iterations, Δ - initial TR radius, Δ¯ - maximum TR radius In ×J Output: Factors: {U (n) } ∈ R+
Initialize: {U (n) } with nonnegative random numbers, ∀n : Δ(0) = Δe In ; Compute C (n) = U (n)T U (n) for n = 1, . . . , N ; repeat for n = 1, . . . , N do Compute Y (n) by unfolding Y along the n-th mode ; 6 Compute Z (n) = Y (n) U −n and B (n) = m=n C (m) , U (n,0) ← U (n) ; 7 for k = 0, 1, . . . , k T R do 8 Estimate the Cauchy point D∗(n,k) according to (19); 1
2 3 4 5
9 10 11 12 13 14 15 16
(n,k) (n,k) ∗(n,k) Ψ z in ,uin B (n) −Ψ z in , uin +d in B (n) Evaluate the gain ratios = ; ∗(n,k) m k (0)−m k d in (n,k) ∗(n,k) (k) if ρin > 43 and ||d in ||2 = Δin then (k+1) (k) ¯ Δin = min(2Δin , Δ) ; // Expansion of U (n,k+1) = U (n,k) + D∗(n,k) + ; (n,k) else if 43 ≥ ρin ≥ 41 then (k+1) (k) Δin = Δin , U (n,k+1) = U (n,k) + D∗(n,k) + ; (n,k) ρin
else (k+1) ∗(n,k) Δin = 41 ||d in ||2 , U (n,k+1) = U (n,k) ;
19
U (n) = U (n,kT R ) ; if n < N then Normalize the columns in U (n) to the unit l1 -norm
20
C (n) = U (n)T U (n) ;
17 18
21
until Stop criterion is satisfied;
TR
// Reduction of TR
322
R. Zdunek and K. Fonał
β = 0 (generalized KL divergence), β = 1 (squared Euclidean distance), and β = 2 (corresponds to a sub-Gaussian distribution). The factors are evaluated quantitatively using the signal to interference ratio (SIR) measure [4]). The averaged SIR results obtained for various test cases are shown in Fig. 1. The standard deviations are marked with the whiskers. 160 α -NTF: α = 1 α -NTF: α = 2 β-NTF: β = -1 β-NTF: β = 0 β-NTF: β = 1 β-NTF: β = 2 TR-NTF: β = -1 TR-NTF: β = 0 TR-NTF: β = 1 TR-NTF: β = 2
140
SIR [dB]
120 100 80 60 40 20 B1, noise-free
B2, noise-free
B1, SNR = 20 dB
B2, SNR = 20 dB
Fig. 1 Averaged SIR values (with standard deviations) obtained for estimating all the factors {U (n) } with the algorithms: α-NTF, β-NTF, and TR-NTF at various scenarios: benchmarks B1 and B2, noise-free and noisy data, α ∈ {1, 2}, and β ∈ {−1, 0, 1, 2}
Elapsed time [sec.]
103 α-NTF: α = 1 α-NTF: α = 2 β-NTF: β = -1 β-NTF: β = 0 β-NTF: β = 1 β-NTF: β = 2 TR-NTF: β = -1 TR-NTF: β = 0 TR-NTF: β = 1 TR-NTF: β = 2
102
101
100
B1: noise-free
B2: noise-free
B1: SNR = 20 dB
B2: SNR = 20 dB
Fig. 2 Averaged elapsed time (with standard deviations) obtained for running the algorithms: α-NTF, β-NTF, and TR-NTF at various scenarios: benchmarks B1 and B2, noise-free and noisy data, α ∈ {1, 2}, and β ∈ {−1, 0, 1, 2}
Trust-Region Strategy with Cauchy Point for Nonnegative …
323
1
1 0.9
0.8
Relative fitting
Relative fitting
0.8 0.7 0.6 0.5
0.6
0.4
0.4 α-NTF: α = 2 β-NTF: β = 2 TR-NTF: β = 2
0.3 0.2
10
20
30
40
50
60
70
80
90
100
110
120
α-NTF: α = 2
0.2
β-NTF: β = 2 TR-NTF: β = 2
0
10
15
20
25
30
35
40
Iterations
Iterations
Noise-free
Noisy data
45
50
55
60
Fig. 3 Relative fitting versus iterations obtained for noise-free and noisy data from benchmark B2 with the algorithms: α-NTF (α = 2), β-NTF (β = 2), and TR-NTF (β = 2)
The algorithms were coded in Matlab 2016a and run on the workstation equipped with CPU Intel i7-8700, 3.2 GHz, 64GB RAM, 1 TB SSD disk. The runtime averaged over the MC runs is presented in Fig. 2. For each algorithm relative fitting of the model to data versus alternating steps was also computed, and it is illustrated in Fig. 3 for the selected test cases, separately for noise-free and noisy data.
5 Discussion and Conclusions The results presented in Fig. 1 show that there is no statistically significant difference in the SIR performance between the analyzed algorithms and the parameter setting in the corresponding divergence. A significant difference can be only observed between the test cases with and without noisy perturbations, which seems to be obvious for SNR = 20 dB. For all test cases, the standard deviation is very large but this fact results from a variety of the MC runs. The algorithms are sensitive to data, and despite the factors are generated from the same distribution, a distribution of zerovalue entries in the factors plays an important role. It means that there are some MC runs in which all the tested algorithms failed regardless of parameter settings and a benchmark. Thus, the proposed TR-NTF gives a similar SIR performance as other tested algorithms. However, the algorithms differ considerably in a computational complexity. All the algorithms were initialized with the same guess factors, run in the same environment, and terminated according to the same criterion. Hence, we assume that the elapsed time measured in Matlab is a good measure of the computational complexity. Figure 2 demonstrated that the TR-NTF for noisy data is at least 10 times faster than the other algorithms. For noise-free data, the difference sometimes exceeds 100 times but this may also result from a faster convergence, which is illustrated in Fig. 3. TR-NTF converges to a limit point faster than β-NTF, which may also affects the elapsed time considerably.
324
R. Zdunek and K. Fonał
Summing up, the proposed TR-NTF seems to be an efficient algorithm for solving NTF problems, especially when the metrics are given by the β-divergence.
References 1. Bardsley, J.M.: A nonnegatively constrained trust region algorithm for the restoration of images with an unknown blur. Electron. Trans. Numer. Anal. 20, 139–153 (2005) 2. Basu, A., Harris, I.R., Hjort, N.L., Jones, M.C.: Robust and efficient estimation by minimising a density power divergence. Biometrika 85(3), 549–559 (1998) 3. Carroll, J.D., Chang, J.J.: Analysis of individual differences in multidimensional scaling via an n-way generalization of Eckart-Young decomposition. Psychometrika 35, 283–319 (1970) 4. Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.I.: Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation. Wiley and Sons (2009) 5. Cichocki, A., Cruces, S., Amari, S.I.: Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization. Entropy 13 (2011) 6. Cichocki, A., Phan, A.H.: Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE Trans. 92-A(3), 708–721 (2009) 7. Cichocki, A., Zdunek, R., Choi, S., Plemmons, R.J., Amari, S.: Non-negative tensor factorization using alpha and beta divergences. In: Proceedings of ICASSP 2007, Honolulu, Hawaii, USA, April 15–20, pp. 1393–1396 (2007) 8. Eguchi, S., Kano, Y.: Robustifying maximum likelihood estimation by psi-divergence. Institute of Statistical Mathematics (2001) (Res. Memorandum 802) 9. FitzGerald, D., Cranitch, M., Coyle, E.: Extended nonnegative tensor factorisation models for musical sound source separation. Comput. Intell. Neurosci. 872425, 1–15 (2008) 10. Friedlander, M.P., Hatz, K.: Computing nonnegative tensor factorizations. Comput. Optim. Appl. 23(4), 631–647 (2008) 11. Harshman, R.A.: Foundations of the PARAFAC procedure: Models and conditions for an “explanatory” multimodal factor analysis. UCLA Working Papers in Phonetics 16, 1–84 (1970) 12. Heiler, M., Schnoerr, C.: Controlling sparseness in non-negative tensor factorization. Springer LNCS, vol. 3951, pp. 56–67 (2006) 13. Kim, H., Park, H., Elden, L.: Non-negative tensor factorization based on alternating largescale non-negativity-constrained least squares. In: Proceedings of BIBE’2007, pp. 1147–1151 (2007) 14. Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009) 15. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401, 788–791 (1999) 16. Mørup, M., Hansen, L.K., Arnfred, S.M.: Algorithms for sparse nonnegative Tucker decompositions. Neural Comput. 20(8), 2112–2131 (2008) 17. Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research. Springer, New York (1999) 18. Rojas, M., Steihaug, T.: An interior-point trust-region-based method for large-scale nonnegative regularization. Inverse Probl. 18, 1291–1307 (2002) 19. Shashua, A., Hazan, T.: Non-negative tensor factorization with applications to statistics and computer vision. In: Proceedings of the 22th International Conference on Machine Learning. Bonn, Germany (2005) 20. Shashua, A., Zass, R., Hazan, T.: Multi-way clustering using super-symmetric non-negative tensor factorization. In: European Conference on Computer Vision (ECCV). Graz, Austria (May 2006)
Trust-Region Strategy with Cauchy Point for Nonnegative …
325
21. Simsekli, U., Virtanen, T., Cemgil, A.T.: Non-negative tensor factorization models for bayesian audio processing. Digital Signal Process. 47, 178–191 (2015) 22. Xiong, F., Zhou, J., Qian, Y.: Hyperspectral imagery denoising via reweighed sparse low-rank nonnegative tensor factorization. In: 25th IEEE International Conference on Image Processing (ICIP), pp. 3219–3223 (Oct 2018) 23. Zdunek, R., Cichocki, A.: Nonnegative matrix factorization with quadratic programming. Neurocomputing 71(10–12), 2309–2320 (2008) 24. Zdunek, R.: Trust-region algorithm for nonnegative matrix factorization with alpha- and betadivergences. In: Proceedings of DAGM/OAGM, LNCS, vol. 7476, pp. 226–235. Graz, Austria (28–31 Aug 2012)
Multi-Criteria Decision Analysis—Theory and Their Applications
Digital Twin Technology for Pipeline Inspection Radda A. Iureva, Artem S. Kremlev, Vladislav Subbotin, Daria V. Kolesnikova, and Yuri S. Andreev
Abstract In this paper, authors investigated the market of cyber-physical systems for pipeline diagnostics, revealed the features of the external environment of operation of these systems, which has a significant impact on the components of the system. Development prospects are assessed, the pros and cons of existing solutions are highlighted, and its structural scheme of CPS for pipeline diagnostics was proposed. The analysis and conclusions provide that the digital twin technology allows increasing not only the fault tolerance of the CPS for pipeline diagnostics itself, but also, in general, the level of technogenic safety at the facilities.
1 Introduction 1.1 Industry 4.0 In recent years, the terms of Industry 4.0 are often found, which is an initiative program of German politicians and businessmen to massively introduce the latest technologies in industry and everyday life. The program “Industry 4.0” is characterized by high production rates, the use of modern advanced technologies, and the universal mutual integration of automation systems. To comply with the standards of Industry 4.0 and maintain competitiveness in modern conditions, a manufacturing enterprise, just like a finished high-tech product, must have the most advanced automation system [1]. In addition to widespread digitization, the concept of Industry 4.0 is based on Cyber-Physical Systems (CPSs). CPS is an information technology concept, which implies the integration of computing resources in physical entities of any kind, including biological and man-made objects. In CPS, the computing component is distributed throughout the physical system, which is its carrier, and synergistically R. A. Iureva (B) · A. S. Kremlev · V. Subbotin · D. V. Kolesnikova · Y. S. Andreev ITMO University, Saint Petersburg, Russia e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_28
329
330
R. A. Iureva et al.
linked to its constituent elements [2]. Such systems are ubiquitous. They are used in such industrial spheres as energy (nuclear, oil and gas, turbomachinery), aircraft engines and systems, sophisticated industrial equipment (pumps, drives, etc.), rail and road transport systems, and medical equipment. It is difficult to underestimate the use of CPS for work in hard–to–reach and dangerous places for humans. One of these applications is CPS for gas pipeline diagnostics. The proceedings [3] have demonstrated and discussed the fundamentals, applications, and experiences in the field of digital twins and virtual instrumentation. This is a presented view of the increasing globalization of education and the demand for teleworking, remote services, and collaborative working environments. The paper [4] presents a digital twin-driven manufacturing cyber-physical system for parallel controlling of smart workshops under mass individualization paradigm. By establishing cyber-physical connection with decentralized digital twin models, various manufacturing resources can be formed as dynamic autonomous systems to co-create personalized products. The paper [5] discusses the benefits of integrating digital twins with system simulation and Internet of Things in support of model-based engineering and provides specific examples of the use and benefits of digital twin technology in different industries. The concept of digital twins and the chances for novel industrial applications are discussed in [6]. Mathematics are a key enabler and the impact is highlighted along four specific examples addressing Digital Product Twins democratizing Design. The paper [7] provides a critical perspective to the benefits of digital twins, their applications, as well as the challenges encountered following their use. Cybersecurity risks are one of these key challenges, which are discussed in this paper. digital twins’ flexible configurations of applications and data storage, especially to integrate third parties and their architecture based on digital twins is one alternative for managing this complexity [8]. According to [9, 10], the process of digitization in manufacturing companies has advanced to the virtualization phase. The final state should be a complete virtualization of the production plant. These papers summarize the findings of the implementation of the “Digital Twin” methodology in a real manufacturing company. The paper [11] answers the following question: How to couple the simulation model gathered in a digital twin and a machine learning predictive maintenance algorithm. cyber-physical production system plays [12] a vital role in the whole lifecycle of eco-designed products. In particular, the paper represents a suitable way for manufacturers that want to involve their customers, delivering instructions to machines about their specific orders and follow its progress along the production line, in an inversion of normal manufacturing. The digital twins can forecast failures before they affect or damage the products. They empower the manufacturers with instant troubleshooting by adjusting the parameters along the production line in the twin. As well, they suppose engineers to analyze product behavior by comparing the digital model with the actual product. The digital twins can also diagnose products that are already in the field or working environment. It gives opportunity to technicians to carry necessary equipment and
Digital Twin Technology for Pipeline Inspection
331
tools while troubleshooting any malfunction in the product physically. Manufacturing companies could save money and provide better customer service with the incorporation of this technology. So, the importance of digital twin is to consider features of engine operation, and based on this data it is possible to plan time and amount of engine repairs and consider preventive maintenance. These features make the digital twin technology extremely valuable for consumers, because its application allows to accurately assess equipment components condition. The value is to reduce the costs associated with unscheduled repairs. In General Electric Aviation, for example, digital twin aircraft combine different data sources to increase defect detection speed and repair accuracy. As a result, in 2016 this saved the company $125 million [13]. Products or production plants continuously transmit data on their work to digital model, which constantly monitors conditions of equipment and energy consumption indicators of production systems. This greatly simplifies the preventative maintenance process, prevents downtime, and reduces energy consumption. At its core, a digital twin is a software analogue of a physical device that simulates the internal processes, technical characteristics, and behavior of a real object under the influence of interference and the environment. An important feature of the digital twin is that information from the sensors of a real device operating in parallel are used to set input influences on it. Work is possible both online and offline. Further, it is possible to compare the information of virtual sensors of a digital double with the sensors of a real device, to identify anomalies and the causes of their occurrence (Fig. 1). The rest of the work is organized as follows: In Sect. 1.2, the problem is formulated, stages of a digital twin for CPS are recalled. The relevance of this topic is described
Fig. 1 The core issue of digital twin working concept
332
R. A. Iureva et al.
in Sect. 2. In Sect. 3, the proposed method of fault-tolerant system is proposed. Section 4 includes advantages of fault-tolerance system.
1.2 Digital Twin for CPS Digital twin is used at all stages of the product life cycle, including development, manufacture, and operation. Already at the stage of preliminary design on using a digital twin, it is possible to create variations of the system model of the product being developed for evaluating and choosing from various versions of technical solutions [2]. Further, at the stage of technical design, the model obtained at the previous stage can be finalized and refined using more accurate system models of elements, which in turn can be obtained by numerical modeling, integration of firmware and control interfaces is possible. This multi-physical, accurate system model allows considering and optimizing the interaction of all elements on operating modes and environmental influences [3]. At the manufacturing stage, the developed system model (which may already be called digital twin) helps in determining the required tolerances, manufacturing accuracy to comply with the characteristics and uptime of the product during the entire service life, and will also quickly identify the causes of malfunctions during the testing process. In the transition to the operation phase of the product, the digital twin model can be finally used to implement feedback with the development and manufacture of products, diagnostics, and prediction of malfunctions, increased work efficiency, recalibration, and identification of new consumer needs. During the operation of equipment and CPS, three main management strategies for its maintenance and repair can be defined: • event maintenance or reactive maintenance • scheduled preventive maintenance • Actual Condition Service (ACS). Event maintenance involves the replacement of failed parts due to their breakdown, which often increases the cost of repair and downtime during work. Preventive maintenance is the most common type of maintenance today and involves replacing parts at specific time intervals, which are determined by calculating the average time between failures. The most advanced type of maintenance is maintenance by the actual condition. It implies the elimination of equipment failures by interactively assessing the technical condition of the equipment from the totality of data coming from its sensors and determining the optimal timing for repair work [5]. Digital twin is one of the ACS tools that allows you to simulate various options for full and partial failures, the operation of devices considering their operating modes, environmental influences, and various degrees of wear of parts. This is especially
Digital Twin Technology for Pipeline Inspection
333
important when designing systems on which technological safety depends. An essential aspect of the successful use of digital twins is that the development of devices and systems should be carried out with this concept, which significantly affects the construction of business processes of the enterprise and the development of new services.
2 Relevance of the Topic 2.1 CPS for Technogenic Security CPS and their digital twins are increasingly growing. For example, Toshiba company uses the principle of CPS in its virtual power plant project (Fig. 2), which uses the Internet of things to coordinate the work of distributed energy sources (solar, hydrogen, and wind energy), electric vehicles and energy storage/storage systems that consume it. Using data from devices of the Internet of things and AI technologies, in this case, it is possible to optimize the power consumption of the system, predict its scale, and ultimately achieve maximum energy savings. Based on CPS architecture, the method for locating leakage for pipelines with multibranch can be proposed. The singular point of pressure signals at the ends of pipeline with multibranch is analyzed by wavelet packet analysis, so that the time feature samples could be established [4]. The schematic of the pipeline leak location system based on CPS [4] is presented in Fig. 3. As CPS is the integration of computation, communication, and control loop, it improves the reliability of physical applications and can be used to solve the problems
Fig. 2 Functional diagram of the interaction of CPS systems [3]
334
R. A. Iureva et al.
Fig. 3 Schematic of pipeline leak location system based on CPS [4]
about signal transmission between heterogeneous networks, the leak location method of the pipeline.
2.2 Fault Detection in CPS When using CPS for pipeline diagnostics, it is necessary to ensure the uninterrupted operation of the system and its functional safety. This is because, at this stage of development, the control of faults in pipelines takes place with the direct participation of a person. And errors in the operation of the system can bring with it, human victims. Thus, timely, predictive fault detection will allow you to enter a safe mode of operation, and increase the level of technological safety. Using digital twin in CPS helps to improve fault detection. The main idea here is to avoid such problems as Systems do not render the services they were designed for • Systems run out of control; • Energy and material waste, loss of production, damage to the environment, loss of human lives.
Digital Twin Technology for Pipeline Inspection
335
3 Modeling Fault-Tolerance CPS 3.1 Non-destructive Testing Methods Pipeline diagnostics is part of a set of non-destructive testing methods that allows you to identify internal pipeline defects using technical diagnostics. To fulfill these and other tasks, a CPS is being developed, to which specific requirements are imposed. Given the considerable length of gas distribution networks and their complex configuration, in some cases, the most effective means of conducting pipeline diagnostics of gas pipelines is the use of flaw detection devices that are moved using mobile robots. The use of robots for inspection of pipelines is one of the most promising solutions that can prevent man-made and environmental disasters and accidents in advance. The main advantage of robots is the diagnosis of pipelines without opening, which significantly facilitates the work of technical specialists. The task of mobile robots is to transport diagnostic equipment in pipelines of complex configuration, which should be ensured by an effective robot control system. The analysis of information on a variety of robotic complexes for pipeline diagnostics revealed that such developments are highly specialized and do not allow solving the whole range of tasks. Various developments around the world were considered—Iran, the People’s Republic of China, Japan, Germany, USA, and South Korea [5]. Global practice shows that, depending on the specific task, developers choose the most suitable diagnostic solutions and highly specialized devices for work, while there are no universal automatic robotic complexes. In practice, the procedure for diagnosing gas pipelines by the method of non-destructive testing is quite challenging to maintain and expensive. Significant time is spent preparing the gas pipeline before conducting a diagnostic examination. And the diagnostic process itself can be carried out only by highly qualified specialists. All this, directly or indirectly, entails an increase in costs when creating a CPS already at the very early stages of development and operation. Thus, there is a need to develop a universal CPS using digital twin technology (Fig. 4). CPS consists of an explosion-proof chassis unit, a sensor and diagnostic unit, and a control unit. Communication between the robot and the control system is carried out wirelessly, which allows you to receive information in an interactive mode about the location of the CPS inside the pipeline, including determining the linear coordinate, bank, and also provide data and control commands. The fault tolerance of the CPS for monitoring the internal failure of the pipeline lies in the possibility of identifying failures based on its digital counterpart. Failures of CFS can be caused by a violation of information and/or functional security of the system, and automation of identification of actual attacks on CFS will reduce risks during non-destructive testing.
336
R. A. Iureva et al.
Fig. 4 Functional diagram of the interaction of CPS systems
3.2 Fault Diagnosis Supervision The main idea is to compare digital twin behavior and CPS behavior while operating in a pipe. This means that we have to generate a fault indicating signal as we have input and output signal for CPS on its digital twin. If there is no fault, there are no novelties in systems behavior (Fig. 5) as we predict the normal mode of the working system with digital twin and real mode in the pipe. When the fault occurs in the CPS behavior the response of the residual vector of fault diagnosis supervision subsystem is Fig. 5 Fault diagnosis supervision [14]
Digital Twin Technology for Pipeline Inspection
337
Fig. 6 Fault signal and residual [14] g r (s) = Hy (s)G f (s) f (s) = [G r f (s)]i f i (s)
(1)
i=1
where G r f (s) = Hy (s)G f (s) is defined as fault matrix which represents the relation between the residual and faults G r f (s)]i is the i th column of the transfer matrix G r f (s) and f i (s) is the i th component f(s) [14]. It is illustrated in Fig. 6. Fault diagnosis in CPS consists of three main steps: (1) Fault detection: Detect malfunctions in real time, as soon and as surely as possible; (2) Fault isolation: Find the root cause, by isolating the system component(s) whose operation mode is not nominal; (3) Fault identification: to estimate the size and type or nature of the fault. The main causes which lead faults: (1) design errors; (2) implementation errors; (3) human errors, use, wear, deterioration, damages, etc. The main consequences lead to (1) worse performances; (2) energy waste; (3) waste of raw materials, economic losses, lower quality, lower production, etc. To develop reliable system means to provide the system with the hardware architecture and software mechanisms which will allow, if possible, to achieve a given objective not only in normal operation, but also in given fault situations, fault-tolerant.
338
R. A. Iureva et al.
4 Conclusion In such a case, we get a fairly universal design of the CPS, which is not only capable of solving the bulk of the tasks assigned to it, but also fully complies with the rather stringent requirements of operation and the external environment. On using the digital twin concept in the further development of this design, we will be able to control the robot fault tolerance and prevent Actual Condition Service, but not on schedule. Thus, according to the basic tenets of Industry 4.0, we get the opportunity to develop CPS for pipeline inspection, which will not only be economically observable and beneficial, but will also help to increase the level of technogenic safety, protecting people and the environment from potential technogenic accidents. Bedsides the proposed scheme helps to provide fault-tolerance control and improve the system’s reliability. Acknowledgements This work was financially supported by Government of Russian Federation (Grant 08–08) and by the Ministry of Science and Higher Education of Russian Federation, passport of goszadanie no. 2019-0898.
References 1. Sanfelice R.G.: Analysis and Design of Cyber-Physical Systems. A Hybrid Control Systems Approach. Cyber-Physical Systems: From Theory to Practice. CRC Press (2016) 2. Lee, Edward A.: Cyber-Physical Systems - Are Computing Foundations Adequate? Position Paper for NSF Workshop On Cyber-Physical Systems: Research Motivation, pp. 16–17. Techniques and Roadmap. TX, Austin (October (2006) 3. Solutions that realize next-generation transmission and distribution through IoT technologies. https://www.toshiba-energy.com/en/transmission/product/iot.htm. Accessed 26 Dec 2019 4. Jiewu, Leng, Zhang, Hao: Digital twin-driven manufacturing cyber-physical system for parallel controlling of smart workshop. J. Ambient Intell. Humaniz. Comput. 10(3), 1155–1166 (2019) 5. Auer Michael, E., Kalyan Ram, B.: Cyber-physical systems and digital twins. In: Proceedings of the 16th International Conference on Remote Engineering and Virtual Instrumentation, ISBN: 978-3-030-23161-3, Springer (2020) 6. Fryer, T.: Digital twin - introduction. This is the age of the digital twin. Eng. Technol. 14(1), 28–29 (February 2019) 7. Stark, R., Damerau, T.: Digital Twin. In book: CIRP Encyclopedia of Production Engineering. Springer, Heidelberg (2019) 8. Adjei, P., Montasari, R.A.: Critical overview of digital twins. Int. J. Strateg. Eng. 3(1), 51–61 (2020) 9. Harper, E., Ganz, Ch., Malakuti, S.: Digital twin architecture and standards. IIC J. Innov. (November 2019) 10. Janda, P., Hájíˇcek, Z.: Implementation of the digital twin methodology. In book: Proceedings of the 30th International DAAAM Symposium “Intelligent Manufacturing & Automation”, pp. 533–538 (2019) 11. Yang, W., Tan, Y., Yoshida, K., Takakuwa, S.: Digital twin-driven simulation for a cyberphysical system in Industry 4.0. DAAAM International Scientific Book 2017, pp. 227–234 (October 2017) 12. Hantsch, A.: From Digital Twin to Predictive Maintenance. Conference: AI Monday Leipzig Project, Leipzig (November 2019)
Digital Twin Technology for Pipeline Inspection
339
13. Dobrynin, A.: The digital economy - the various ways to the effective use of technology (BIM, PLM, CAD, IOT, Smart City, BIG DATA, and others). Int. J. Open Inf. Technol. 4(1), 4–11 (2016) 14. Chen, J., Patton, R.J.: Robust Model-Based Fault Diagnosis for Dynamic Systems. ISBN 978–1-4615-5149-2, Springer (1999)
Application of Hill Climbing Algorithm in Determining the Characteristic Objects Preferences Based on the Reference Set of Alternatives Jakub Wi˛eckowski, Bartłomiej Kizielewicz, and Joanna Kołodziejczyk
Abstract Random processes are a frequent issue when trying to solve problems in various areas. The randomness factor makes it difficult to clearly define the input parameters of a system in maximizing its effects. The solution to this problem may be the usage of stochastic optimization methods. In the following article, the Hill Climbing method has been used to solve the problem of optimization, which in combination with the COMET method gave satisfactory results by determining the relationship between the preference assessment of already existing alternatives to the newly determined alternatives. The motivation to conduct the study was the desire to systematize knowledge on the effective selection of input parameters for stochastic optimization methods. The proposed solution indicates how to select the grid size in an unknown problem and the step size in the Hill Climbing method.
1 Introduction Many of the problems currently being considered in various fields are related to determining the results of random processes. Due to their randomness, no clear and equally effective solutions can be expected, especially with numerous attempts to solve the same problem. The obtained result is influenced by the methods used, the initial data introduced, and the desired final effect. Therefore, the procedure of selecting parameters for a specific problem is an important part of solving random processes, on which the effectiveness and efficiency of the obtained result depend [26]. For random processes, stochastic optimization methods are used [10, 11]. Their performance is based on the selection of a single initial state, from which an adjacent state is then generated. Because these are stochastic methods, the result of creating successive states at each start of the algorithm gives different results. The advantages J. Wi˛eckowski (B) · B. Kizielewicz · J. Kołodziejczyk Research Team on Intelligent Decision Support Systems, Department of Artificial Intelligence and Applied Mathematics, Faculty of Computer Science and Information Technology, West ˙ Pomeranian University of Technology in Szczecin, Zołnierska 49, 71-210 Szczecin, Poland e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_29
341
342
J. Wie˛ckowski et al.
of this type of solution are finding solutions in a space where deterministic methods cannot manage the small requirements concerning the use of hardware memory. One of the methods belonging to the group of stochastic methods is discussed in the following article, the Hill Climbing method [9], which consists in maximizing or minimizing the function. Hill Climbing is a method that offers a wide range of uses, from general problems [7] to specific issues including photovoltaic systems [25] or Bayesian network structures [20]. It can be applied to problems considered in discrete (standard) space, as well as to those in continuous space (a variety called gradient method) [1, 24, 27]. Hill Climbing is a popular method because of the simplicity of its operation and the possibility to easily modify the algorithm to adjust the behavior to specific problems [14]. The Characteristic Objects Method (COMET) is a technique solving the MultiCriteria Decision-Making (MCDM) problems. It allows a single decision-maker or group decision-makers to create a dependency model based on the pairwise comparison of characteristic objects that create a ranking of weights between individual criteria [5]. Besides, the method allows dividing the criteria into categories, which helps to reduce the number of characteristic object pairs compared by the expert. It is an important advantage of this solution, as it speeds up the modeling process and allows to divide the model into smaller sub-models [17]. Moreover, the COMET is completely free of the rank reversals phenomenon, as it is based on characteristic objects and fuzzy rules [19]. In this study, we used the COMET method to solve a multi-criteria problem under current uncertainty. The proposed model was subjected to the Hill Climbing method, which ensured that the planned solution was achieved. The main motivation to conduct the study is to show the effectiveness of the Hill Climbing method based on precisely selected parameters. The rest of the paper is organized as follows. In Sect. 2, we introduced the basics of the fuzzy logic theory. Section 3 presents COMET method, which is based on Triangular Fuzzy Numbers (TFN). Section 4 contains a description and the presentation of the algorithm of the Hill Climbing method. Section 5 presents the use of the Hill Climbing method based on two selected criteria under the assessment of the COMET method. Finally, the results are summarized, and conclusions are drawn from the conducted researches in Sect. 6.
2 Fuzzy Logic—Preliminaries The growing importance of the Fuzzy Set Theory [28], and their extensions, in model creation in numerous scientific fields has proven to be an effective way to approach and solve multi-criteria decision problems [2, 3, 6, 15, 21, 23]. The necessary concepts of the Fuzzy Set Theory are described as follows [4, 13]: The fuzzy set and the membership function—the characteristic function μ A of a crisp set A ⊆ X assigns a value of either 0 or 1 to each member of X , as
Application of Hill Climbing Algorithm in Determining the Characteristic Objects …
343
well as the crisp sets only allow a full membership (μ A (x) = 1) or no membership at all (μ A (x) = 0). This function can be generalized to a function μ A˜ so that the value assigned to the element of the universal set X falls within a specified range, i.e., μ A˜ : X → [0, 1]. The assigned value indicates the degree of membership of the element in the set A. The function μ A˜ is called a membership function and the set A˜ = (x, μ A˜ (x)), where x ∈ X , defined by μ A˜ (x) for each x ∈ X is called a fuzzy set [29]. ˜ defined on the universal The Triangular Fuzzy Number (TFN)—a fuzzy set A, ˜ m, b) if its set of real numbers , is told to be a triangular fuzzy number A(a, membership function has the following form (1): ⎧ 0 ⎪ ⎪ ⎪ x−a ⎪ ⎨ m−a μ A˜ (x, a, m, b) = 1 ⎪ b−x ⎪ ⎪ b−m ⎪ ⎩ 0
x ≤a a≤x ≤m x =m m≤x ≤b x ≥b
(1)
and the following characteristics (2), (3): x1 , x2 ∈ [a, b] ∧ x2 > x1 ⇒ μ A˜ (x2 ) > μ A˜ (x1 )
(2)
x1 , x2 ∈ [b, c] ∧ x2 > x1 ⇒ μ A˜ (x2 ) > μ A˜ (x1 )
(3)
The Support of a TFN—the support of a TFN A˜ is defined as a crisp subset of the A˜ set in which all elements have a non-zero membership value in the A˜ set (4): ˜ = x : μ A˜ (x) > 0 = [a, b] S( A)
(4)
The Core of a TFN—the core of a TFN A˜ is a singleton (one-element fuzzy set) with the membership value equal to 1 (5): ˜ = x : μ A˜ (x) = 1 = m C( A)
(5)
The Fuzzy Rule—the single fuzzy rule can be based on the Modus Ponens tautology [12]. The reasoning process uses the IF − T H E N , O R and AN D logical connectives. The Rule Base—the rule base consists of logical rules determining the causal relationships existing in the system between the input and output fuzzy sets [12].
344
J. Wie˛ckowski et al.
The T-norm Operator (product)—the T-norm operator is a T function modeling ˜ In the AN D intersection operation of two or more fuzzy numbers, e.g., A˜ and B. this paper, only the ordinary product of real numbers is used as the T-norm operator [8] (6): μ A˜ (x)AN Dμ B˜ (y) = μ A˜ (x) · μ B˜ (y)
(6)
3 The Characteristic Objects Method The COMET method is a first method, which is completely free of the Rank Reversal phenomenon [19]. In previous works, the accuracy of the COMET method was verified [18]. The formal notation of the COMET method should be shortly recalled [16, 18, 22]. Step 1. Define the Space of the Problem—the expert determines the dimensionality of the problem by selecting the number r of criteria, C1 , C2 , ..., Cr . Then, the set of fuzzy numbers for each criterion Ci is selected (7): Cr = {C˜ r 1 , C˜ r 2 , ..., C˜ r cr }
(7)
where c1 , c2 , ..., cr are numbers of the fuzzy numbers for all criteria. Step 2. Generate Characteristic Objects—The characteristic objects (C O) are obtained by using the Cartesian Product of fuzzy numbers cores for each criteria as follows (8): C O = C(C1 ) × C(C2 ) × · · · × C(Cr ) (8) Step 3. Rank the Characteristic Objects—the expert determines the Matrix of Expert Judgment (M E J ). It is a result of pairwise comparison of the COs by the problem expert. The M E J matrix contains results of comparing characteristic objects by the expert, where αi j is the result of comparing C O i and C O j by the expert. The function f exp denotes the mental function of the expert. It depends solely on the knowledge of the expert and can be presented as (9). Afterwards, the vertical vector of the Summed Judgments (SJ) is obtained as follows (10). ⎧ ⎨ 0.0, f exp (C O i ) < f exp (C O j ) αi j = 0.5, f exp (C O i ) = f exp (C O j ) ⎩ 1.0, f exp (C O i ) > f exp (C O j ) S Ji =
t j=1
αi j
(9)
(10)
Application of Hill Climbing Algorithm in Determining the Characteristic Objects …
345
Finally, values of preference are approximated for each characteristic object. As a result, the vertical vector P is obtained, where ith row contains the approximate value of preference for C O i . Step 4. The Rule Base—each characteristic object and value of preference is converted to a fuzzy rule as follows (11): IF C(C˜ 1i ) AN D C(C˜ 2i ) AN D ... T H E N Pi
(11)
In this way, the complete fuzzy rule base is obtained. Step 5. Inference and Final Ranking—each alternative is presented as a set of crisp numbers (e.g., Ai = {a1i , a2i , ..., ari }). This set corresponds to criteria C1 , C2 , ..., Cr . Mamdani’s fuzzy inference method is used to compute preference of ith alternative. The rule base guarantees that the obtained results are unequivocal. The bijection makes the COMET completely rank reversal free.
4 Hill Climbing Method The Hill Climbing method belongs to the group of stochastic optimization methods. This algorithm is started by selecting the initial state. Afterwards, the algorithm enters a loop, where subsequent descendant states are generated, as modifications of the previous states. The function values for the descendant and parent states are compared and when the descendant gets a higher value than the parent, these values are replaced. The process continues until a more favorable function value can be found. The initial state for this method can be selected randomly or predetermined by the user. Advantages of this method are small memory requirements and ease of adjusting the performance of the method to specific test cases, while disadvantages are disposal in local optima and a problem with moving in an area where the function is fixed or slightly changing. The pseudocode for subsequent algorithm steps is shown in Fig. 1.
5 Study Case This paper presents an evaluation of the model concerning the assessment of a set of alternatives based on probability. Taking into account the preference values of characteristic objects, values that provide the smallest sum of errors calculated as the sum of the absolute differences between the determined input preference and one calculated using the COMET method are searched for. The first stage is to determine the space of the problem under consideration. In this process, this space was determined on the basis of two criteria C1 and C2 .
346
J. Wie˛ckowski et al.
Fig. 1 Diagram of an example of a confidential decision-making function
The created decision-making function is shown in Fig. 2. The next step is to select points of characteristic objects from the existing problem space. They are a known vector of values of preferential alternatives, which will then be compared with the values of random alternatives. Figure 2. also shows the specified starting alternatives of characteristic objects. Afterwards, preferences are searched for 9 characteristic objects. The identified rule base is present as (12). The Hill Climbing method is used for this purpose, and Fig. 3. shows the obtained target function for a given problem. The last step is to select the test points of characteristic objects, which will then be
Application of Hill Climbing Algorithm in Determining the Characteristic Objects …
347
Fig. 2 Example of a determined decision-making function and random selected training set of alternatives Fig. 3 Diagram of matching the goal function
subjected to the COMET method in order to obtain the preference value of these objects. Figure 4. shows the selected set of test points. R1 : R2 : R3 : R4 : R5 : R6 : R7 : R8 : R9 :
IF IF IF IF IF IF IF IF IF
C1 C1 C1 C1 C1 C1 C1 C1 C1
∼ 0.0 ∼ 0.0 ∼ 0.0 ∼ 0.5 ∼ 0.5 ∼ 0.5 ∼ 1.0 ∼ 1.0 ∼ 1.0
AN D C2 AN D C2 AN D C2 AN D C2 AN D C2 AN D C2 AN D C2 AN D C2 AN D C2
∼ 0.0 T H E N ∼ 0.5 T H E N ∼ 1.0 T H E N ∼ 0.0 T H E N ∼ 0.5 T H E N ∼ 1.0 T H E N ∼ 0.0 T H E N ∼ 0.5 T H E N ∼ 1.0 T H E N
0.0234 0.2836 0.7603 0.2472 0.7673 1.0000 0.3635 0.4865 0.8688
(12)
The final preferences were calculated and compared with ones obtained at the beginning using the Spearman’s correlation coefficient, which for the presented problem equals 0.9865. Moreover, the goal function of the Hill Climbing method presented in Fig. 3. shows that the optimal solution was obtained before the 1000 iterations were exceeded, while with subsequent iterations the best result so far has not been improved. Table 1 presents set of training alternatives and Table 2 presents set of test alternatives. Both of them contain assessment value from the COMET
348
J. Wie˛ckowski et al.
Table 1 Training set of alternatives and their preferences (P obtained; Pref reference) Ai C1 C2 Pref P Diff 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.5593 0.9435 0.1891 0.7149 0.0713 0.0923 0.3972 0.1001 0.8026 0.7099 0.9171 0.4770 0.5966 0.8328 0.8241 0.8599 0.8500 0.2921 0.3021 0.6478
0.6629 0.9945 0.9448 0.4792 0.5383 0.9259 0.8256 0.3374 0.6396 0.2500 0.8733 0.6678 0.9648 0.6242 0.9513 0.4698 0.0166 0.9449 0.4757 0.3102
0.8156 0.8230 0.8085 0.6264 0.3862 0.6781 0.9072 0.2789 0.7125 0.4748 0.8000 0.8180 0.9825 0.6831 0.8872 0.5429 0.3367 0.9029 0.5934 0.5307
0.8156 0.8796 0.8085 0.6321 0.3864 0.7406 0.8521 0.2789 0.6876 0.4728 0.8000 0.8269 0.9563 0.6630 0.8829 0.5510 0.3367 0.8635 0.5556 0.5314
0.0000 −0.0566 0.0000 −0.0057 −0.0002 −0.0625 0.0551 −0.0000 0.0248 0.0021 0.0000 −0.0089 0.0262 0.0201 0.0043 −0.0081 0.0000 0.0394 0.0378 −0.0007
Fig. 4 Example of a confidential decision-making function and random selected testing set of alternatives
Application of Hill Climbing Algorithm in Determining the Characteristic Objects …
349
Table 2 Testing set of alternatives and their preferences (P obtained; Pref reference) Ai C1 C2 Pref P Diff 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.3990 0.2211 0.3563 0.3290 0.6206 0.4058 0.2593 0.2387 0.8735 0.3668 0.8944 0.5767 0.4626 0.3308 0.5435 0.2634 0.7496 0.7223 0.3443 0.9972
0.1938 0.7898 0.7748 0.8592 0.9017 0.5322 0.0290 0.6255 0.8819 0.7886 0.5663 0.8376 0.3585 0.2437 0.9650 0.6031 0.1381 0.3839 0.6665 0.7311
0.4680 0.7666 0.8576 0.8922 0.9562 0.6922 0.3441 0.6627 0.8316 0.8717 0.5988 0.9333 0.5758 0.4578 0.9889 0.6663 0.4115 0.5518 0.7725 0.6605
0.3832 0.7112 0.7948 0.8290 0.9155 0.6941 0.1624 0.6050 0.8207 0.8102 0.5923 0.8968 0.5894 0.3821 0.9714 0.6103 0.3942 0.5627 0.7194 0.6644
0.0848 0.0554 0.0629 0.0631 0.0407 −0.0019 0.1817 0.0577 0.0109 0.0615 0.0065 0.0364 −0.0136 0.0757 0.0175 0.0561 0.0174 −0.0109 0.0531 −0.0039
method, starting preferential value, preference value obtained from the system, and difference between these values. Figure 4. presents also the identified space for the determined problem.
6 Conclusions With a set of existing alternatives and an assessment of their preferences, it can be assumed from the top down that they are designated correctly. The problem is the appearance of new alternatives and a way to determine their preference values in an effective and equally correct way. The solution to this problem is to use stochastic optimization methods. The Hill Climbing method combined with the COMET method allowed to find preference values of characteristic objects in such a way, that the sum of the values of the absolute differences of these solutions and initial preferences was as small as possible. The considering space of the problem was determined using two criteria C1 and C2 . Once the preferential values of the alternatives selected had been calculated,
350
J. Wie˛ckowski et al.
it had to be verified whether they had been determined correctly, which in the case of these studies was true, and this is confirmed by the Spearman’s correlation coefficient of 0.9865. For future directions, it is worth considering choosing a grid in an unknown problem, as this is an important part of achieving an optimal solution. In addition, it is worth considering indicating the right step for the Hill Climbing method to maximize its results. Acknowledgements The work was supported by the National Science Centre, Decision No. DEC2016/23/N/HS4/01931.
References 1. Alajmi, B.N., Ahmed, K.H., Finney, S.J., Williams, B.W.: Fuzzy-logic-control approach of a modified hill-climbing method for maximum power point in microgrid standalone photovoltaic system. IEEE Trans. Power Electron. 26(4), 1022–1030 (2010) 2. Bashir, Z., W¸atróbski, J., Rashid, T., Sałabun, W., Ali, J.: Intuitionistic-fuzzy goals in zero-sum multi criteria matrix games. Symmetry 9(8), 158 (2017) 3. Boender, C.G.E., De Graan, J.G., Lootsma, F.A.: Multi-criteria decision analysis with fuzzy pairwise comparisons. Fuzzy Sets Syst. 29(2), 133–143 (1989) 4. Deschrijver, G., Kerre, E.E.: On the relationship between some extensions of fuzzy set theory. Fuzzy Sets Syst. 133(2), 227–235 (2003) 5. Faizi, S., Sałabun, W., Rashid, T., W¸atróbski, J., Zafar, S.: Group decision-making for hesitant fuzzy sets based on characteristic objects method. Symmetry 9(8), 136 (2017) 6. Guitouni, A., Martel, J.M.: Tentative guidelines to help choosing an appropriate MCDA method. Eur. J. Oper. Res. 109(2), 501–521 (1998) 7. Goldfeld, S.M., Quandt, R.E., Trotter, H.F.: Maximization by quadratic hill-climbing. Econometrica: J. Econom. Soc. 541–551 (1966) 8. Gupta, M.M., Qi, J.: Theory of T-norms and fuzzy inference methods. Fuzzy Sets Syst. 40(3), 431–450 (1991) 9. Lim, A., Rodrigues, B., Zhang, X.: A simulated annealing and hill-climbing algorithm for the traveling tournament problem. Eur. J. Oper. Res. 174(3), 1459–1478 (2006) 10. Łokietek, T., Jaszczak, S., Niko´nczuk, P.: Optimization of control system for modified configuration of a refrigeration unit. Procedia Comput. Sci. 159, 2522–2532 (2019) 11. Niko´nczuk, P.: Preliminary modeling of overspray particles sedimentation at heat recovery unit in spray booth. Eksploatacja i Niezawodno´sc´ 20, 387–393 (2018) 12. Piegat, A.: Fuzzy modeling and control (Studies in Fuzziness and Soft Computing). Physica 742, (2001) 13. Piegat, A., Sałabun, W.: Nonlinearity of human multi-criteria in decision-making. J. Theor. Appl. Comput. Sci. 6(3), 36–49 (2012) 14. Prügel-Bennett, A.: When a genetic algorithm outperforms hill-climbing. Theor. Comput. Sci. 320(1), 135–153 (2004) 15. Roubens, M.: Fuzzy sets and decision analysis. Fuzzy Sets Syst. 90(2), 199–206 (1997) 16. Sałabun, W.: The Characteristic Objects Method: A New Distance-based Approach to Multicriteria Decision-making Problems. J. Multi-Criteria Decis. Anal. 22(1–2), 37–50 (2015) 17. Sałabun, W., Palczewski, K., W¸atróbski, J.: Multicriteria approach to sustainable transport evaluation under incomplete knowledge: electric bikes case study. Sustainability 11(12), 3314 (2019) 18. Sałabun, W., Piegat, A.: Comparative analysis of MCDM methods for the assessment of mortality in patients with acute coronary syndrome. Artif. Intell. Rev. 48(4), 557–571 (2017)
Application of Hill Climbing Algorithm in Determining the Characteristic Objects …
351
19. Sałabun, W., Ziemba, P., W¸atróbski, J.: The rank reversals paradox in management decisions: the comparison of the AHP and comet methods. In: International Conference on Intelligent Decision Technologies, pp. 181-191. Springer, Cham (2016) 20. Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65(1), 31–78 (2006) 21. W¸atróbski, J., Sałabun, W.: Green supplier selection framework based on multi-criteria decision-analysis approach. In: International Conference on Sustainable Design and Manufacturing, pp. 361–371. Springer, Cham (2016) 22. W¸atróbski, J., Sałabun, W.: The characteristic objects method: a new intelligent decision support tool for sustainable manufacturing. In: International Conference on Sustainable Design and Manufacturing, pp. 349–359. Springer, Cham (2016) 23. W¸atróbski, J., Sałabun, W., Karczmarczyk, A., Wolski, W.: Sustainable decision-making using the COMET method: An empirical study of the ammonium nitrate transport management. In: 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 949–958. IEEE (2017) 24. Xi, B., Liu, Z., Raghavachari, M., Xia, C. H., Zhang, L.: A smart hill-climbing algorithm for application server configuration. In: Proceedings of the 13th International Conference on World Wide Web, pp. 287–296 (2004) 25. Xiao, W., Dunford, W.G.: A modified adaptive hill climbing MPPT method for photovoltaic power systems. In 2004 IEEE 35th Annual Power Electronics Specialists Conference (IEEE Cat. No. 04CH37551), vol. 3, pp. 1957-1963. IEEE (2004) 26. Yao, K.: Spherically invariant random processes: theory and applications. Communications. Information and Network Security, pp. 315–331. Springer, Boston, MA (2003) 27. Yildiz, A.R.: An effective hybrid immune-hill climbing optimization approach for solving design and manufacturing optimization problems in industry. J. Mater. Process. Technol. 209(6), 2773–2780 (2009) 28. Zadeh, L.A.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965) 29. Zimmermann, H.J.: Fuzzy Set Theory and Its Applications. Springer Science & Business Media (2011)
The Search of the Optimal Preference Values of the Characteristic Objects by Using Particle Swarm Optimization in the Uncertain Environment Jakub Wi˛eckowski, Bartłomiej Kizielewicz, and Joanna Kołodziejczyk
Abstract The issue of random processes is becoming increasingly frequent in many scientific problems. The randomness factor makes it difficult to define how to select the input parameters of the system to obtain satisfying results. One of the solutions to this problem may be to use stochastic optimization methods. The following paper uses the Particle Swarm Optimization Algorithm to optimize the obtained sets of alternatives with different amount of particles used, and the COMET method to assess the alternatives. The combination of these two methods gave satisfactory results, by determining the Spearman’s correlation coefficient value between the input set of alternatives and the newly defined set of alternatives for both cases with different amount of particles. The reason for conducting the research was to systematize knowledge on the effective selection of input parameters in stochastic optimization methods. The proposed solution indicates how to select the optimal amount of particles in the Particle Swarm Optimization method and how to determine the grid size in an unknown problem.
1 Introduction Random process is a topic that frequently appears in today’s problems in many fields. The results obtained by using different random methods often depend on the effectiveness of a random number generated or on the input parameters entered. The problem with using such methods is to select these parameters to maximize the effectiveness of the algorithms and avoid situations where the correct solutions are not found.
J. Wi˛eckowski (B) · B. Kizielewicz · J. Kołodziejczyk Research Team on Intelligent Decision Support Systems, Department of Artificial Intelligence and Applied Mathematics, Faculty of Computer Science and Information Technology, West ˙ Pomeranian University of Technology in Szczecin, ul. Zołnierska 49, 71-210 Szczecin, Poland e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_30
353
354
J. Wie˛ckowski et al.
Stochastic optimization methods are used to solve problems related to random processes. It is a characteristic of this group of methods to obtain different results each time the algorithm is running. Subsequent results of the algorithm steps are considered only in the given problem space. They aim to find the global optimum, which can be the maximum or minimum of the specified global function [8, 9]. The following article will present one of the methods that belong to the group of stochastic methods, that is, Particle Swarm Optimization Algorithm. Particle Swarm Optimization Algorithm is a method based on the analogy of behavior of animals living in groups. The basis for information exchange is communication between individual animals, each of them having its own parameters which are position and speed of movement. Particle Swarm Optimization Algorithm belongs to the group of stochastic optimization methods, so it is based on probability characteristics. Moreover, its performance depends on defined input parameters, especially the amount of particles. The algorithm has a wide range of applications and is used for electrical power dispatch [1], in engineering global [16] and numerical optimization [22]. The Characteristic Objects Method (COMET) belongs to a group of methods that solve the Multi-Criteria Decision-Making (MCDM) problems by identifying the complete decisional model [6, 17, 18]. The fuzzy sets theory and their extensions are very popular in this area [2, 3, 7, 12]. The performance of this method is based on expert knowledge, which is used to compare the generated Characteristic Objects (COs). This process determines specific preference values for selected alternatives, by means of which preference values can be calculated for the entered alternatives. COMET allows to divide the model into sub-models, which results in reducing the number of necessary pairwise comparisons of characteristic objects, and thus speeds up the model creation process [14]. In this study, we used the Particle Swarm Optimization Algorithm combined with the COMET method to solve the given problem under uncertainty. COMET was used to calculate the preference values of the selected alternatives, while the Particle Swarm Optimization Algorithm provided a global optimum result. The research was done with one set of alternatives, but two different numbers of particles were used to check its impact on received results. The aim of the research is to systematize knowledge in the context of the influence of input parameters on the effectiveness of the obtained results. The rest of the paper is organized as follows. In Sect. 2, we present the COMET method and its subsequent steps. Section 3 includes a description and the presentation of the Particle Swarm Optimization Algorithm. In Sect. 4 the space of the problem was determined on two selected criteria and solved using both mentioned methods. In Sect. 5 we summarized results and drawn conclusions from performed research.
The Search of the Optimal Preference Values of the Characteristic …
355
2 The Characteristic Objects Method The COMET method proved to be an appropriate technique to solve multi-criteria decision-making problems. Moreover, it is entirely free of the Rank Reversal phenomenon what is rare in the MCDA area. This approach is based on the fuzzy set theory, which is an excellent tool for modeling linguistic data [10, 23, 24]. Many problems were solved successfully by using the COMET method, especially sustainability problems [19–21], and the COMET was extended to uncertain environment [4, 5, 15]. The formal notation of the COMET is recalled according to [11, 13, 14]. Step 1. Define the Space of the Problem—the expert determines the dimensionality of the problem by selecting the number r of criteria, C1 , C2 , ..., Cr . Then, the set of fuzzy numbers for each criterion Ci is selected (1): Cr = {C˜ r 1 , C˜ r 2 , ..., C˜ r cr }
(1)
where c1 , c2 , ..., cr are numbers of the fuzzy numbers for all criteria. Step 2. Generate Characteristic Objects—The characteristic objects (C O) are obtained by using the Cartesian Product of fuzzy numbers cores for each criteria as follows (2): C O = C(C1 ) × C(C2 ) × · · · × C(Cr ) (2) Step 3. Rank the Characteristic Objects—the expert determines the Matrix of Expert Judgment (M E J ). It is a result of pairwise comparison of the COs by the problem expert. The M E J matrix contains results of comparing characteristic objects by the expert, where αi j is the result of comparing C Oi and C O j by the expert. The function f exp denotes the mental function of the expert (3). Next, the vertical vector of the Summed Judgments (S J ) is obtained as follows (4). ⎧ ⎨ 0.0, f exp (C Oi ) < f exp (C O j ) αi j = 0.5, f exp (C Oi ) = f exp (C O j ) ⎩ 1.0, f exp (C Oi ) > f exp (C O j ) S Ji =
t
αi j
(3)
(4)
j=1
Finally, values of preference are approximated for each characteristic object. As a result, the vertical vector P is obtained, where ith row contains the approximate value of preference for C Oi . Step 4. The Rule Base—each characteristic object and value of preference is converted to a fuzzy rule as follows (5): I F C(C˜ 1i ) AN D C(C˜ 2i ) AN D ... T H E N Pi
(5)
356
J. Wie˛ckowski et al.
In this way, the complete fuzzy rule base is obtained. Step 5. Inference and Final Ranking—each alternative is presented as a set of crisp numbers (e.g., Ai = {a1i , a2i , ..., ari }). This set corresponds to criteria C1 , C2 , ..., Cr . Mamdani’s fuzzy inference method is used to compute preference of ith alternative.
3 Particle Swarm Optimization Algorithm Particle Swarm Optimization Algorithm is one of the methods to solve optimization problems with stochastic methods. Each animal in a swarm is represented as a single state with specific position and velocity parameters. Additionally, each state is affected by the position of the whole group and this is influenced by the result achieved by each state. Each iteration of the algorithm is a change in the position of the particle and a change in the velocity vector. The main advantage of this method is the ability to solve problems with non-linear characteristics, which significantly enlarges its area of application. Moreover, this method has a short computational time and is capable of searching a very large space of solutions. On the other hand, the initial parameters are difficult to determine, and with complex issues, problems with being trapped into a local minima can occur. The pseudocode of the discussed algorithm is shown in Fig. 1.
4 Study Case This paper presents a model designed to evaluate a set of alternatives based on probability with different amount of particles to compare how it affects the results. Considering the preferential values of characteristic objects, results that provide the smallest sum of errors calculated as the sum of the absolute difference between the starting preferential values and once received from the COMET method are searched for. The first step is to determine the space of the problem under consideration based on two criteria C1 and C2 . Defined decision-making function is presented in Fig. 2. Second step involves selecting points of characteristic objects from the existing problem space. They are known vector of preferential alternative values, which will be compared with the preferential values of randomly chosen alternatives. Figure 2 also presents the specified starting alternatives of characteristic objects. Next, using the Particle Swarm Optimization Algorithm method, preference values for 9 characteristic objects were looked for. The first attempt was conducted with the number of particles equal to 20, while the second attempt used 30 particles. The identified rule bases are presented (6) for the second attempt, which was more accurate. Figure 3 presents the application of the target function of both tests for the given problem. The
The Search of the Optimal Preference Values of the Characteristic … Fig. 1 Pseudocode of the PSO algorithm
357
358
J. Wie˛ckowski et al.
Fig. 2 Example of a determined decision-making function and randomly selected training set of alternatives
Fig. 3 Diagram of matching the goal function for both cases with different amount of particles
last step is to select the test points of characteristic objects, which will be assessed using the COMET method. Figure 4 shows the selected set of testing alternatives. R1 : R2 : R3 : R4 : R5 : R6 : R7 : R8 : R9 :
IF IF IF IF IF IF IF IF IF
C1 C1 C1 C1 C1 C1 C1 C1 C1
∼ 0.0 ∼ 0.0 ∼ 0.0 ∼ 0.5 ∼ 0.5 ∼ 0.5 ∼ 1.0 ∼ 1.0 ∼ 1.0
AN D C2 AN D C2 AN D C2 AN D C2 AN D C2 AN D C2 AN D C2 AN D C2 AN D C2
∼ 0.0 T H E N ∼ 0.5 T H E N ∼ 1.0 T H E N ∼ 0.0 T H E N ∼ 0.5 T H E N ∼ 1.0 T H E N ∼ 0.0 T H E N ∼ 0.5 T H E N ∼ 1.0 T H E N
0.9852 0.9852 0.4111 0.9852 0.7765 0.2902 0.7137 0.5066 0.1820
(6)
The final preferences were calculated and then compared with the values obtained in the initial stage of the system performance. We used the Spearman’s correlation coefficient to compare those sets and the results for the determined research equals 0.7098 and 0.9865 for cases with 20 and 30 particles, respectively.
The Search of the Optimal Preference Values of the Characteristic …
359
Table 1 Training set and preferences (P obtained; Pref reference) using 20 particles Ai C1 C2 Pref P Diff 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.4387 0.3816 0.7655 0.7952 0.1869 0.4898 0.4456 0.6463 0.7094 0.7547 0.2760 0.6797 0.6551 0.1626 0.1190 0.4984 0.9597 0.3404 0.5853 0.2238
0.7513 0.2551 0.5060 0.6991 0.8909 0.9593 0.5472 0.1386 0.1493 0.2575 0.8407 0.2543 0.8143 0.2435 0.9293 0.3500 0.1966 0.2511 0.6160 0.4733
0.5657 0.9067 0.6285 0.4610 0.4719 0.3179 0.7359 0.8413 0.8094 0.7576 0.5139 0.7963 0.4188 0.9522 0.4337 0.8334 0.6525 0.9185 0.6335 0.8369
Fig. 4 Identified decision area (left 20, right 30 partciles)
0.5205 0.8484 0.8362 0.5390 0.4659 0.2351 0.7725 0.8473 0.8034 0.7945 0.4742 0.8296 0.3897 0.7216 0.4684 0.8842 0.6595 0.8253 0.6789 0.8017
0.0452 0.0583 −0.2077 −0.0780 0.0059 0.0828 −0.0366 −0.0060 0.0060 −0.0369 0.0397 −0.0333 0.0291 0.2306 −0.0347 −0.0508 −00069 0.0931 −0.0454 0.0351
360
J. Wie˛ckowski et al.
Table 2 Testing set and preferences (P obtained; Pref reference) using 20 particles Ai C1 C2 Pref P Diff 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.8147 0.9058 0.1270 0.9134 0.6324 0.0975 0.2785 0.5469 0.9575 0.9649 0.1576 0.9706 0.9572 0.4854 0.8003 0.1419 0.4218 0.9157 0.7922 0.9595
0.6557 0.0357 0.8491 0.9340 0.6787 0.7577 0.7431 0.3922 0.6555 0.1712 0.7060 0.0318 0.2769 0.0462 0.0971 0.8235 0.6948 0.3171 0.9502 0.0344
0.4882 0.7120 0.5257 0.1410 0.5606 0.6235 0.6139 0.7953 0.3998 0.6551 0.6673 0.6696 0.6295 0.9161 0.7697 0.5522 0.6239 0.6412 0.1935 0.6770
0.6043 0.6078 0.5262 0.1420 0.5871 0.6049 0.5705 0.8624 0.5925 0.6400 0.6310 0.5450 0.7113 0.9677 0.7244 0.5406 0.5932 0.7510 0.1524 0.5571
−0.1161 0.1042 −0.0005 −0.0010 −0.0265 0.0186 0.0434 −0.0671 −0.1927 0.0151 0.0363 0.1246 −0.0818 −0.0516 0.0453 0.0116 0.0307 −0.1098 0.0410 0.1199
The difference in obtained results is significant and indicates that in a given problem, usage of more particles provides more satisfactory results. Tables 1 and 2 present values obtained for first attempt using less particles and Tables 3 and 4 present values obtained with greater amount of particles. All of the tables contain information about assessment value from the COMET method, starting preferential value, preferential value obtained from the system, and difference between these values for the training set and testing set. respectively. Figure 4 also presents the identified space for the determined problem for both cases.
The Search of the Optimal Preference Values of the Characteristic …
361
Table 3 Training set and preferences (P obtained; Pref reference) using 30 particles Ai C1 C2 Pref P Diff 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 18 19 20
0.4387 0.3816 0.7655 0.7952 0.1869 0.4898 0.4456 0.6463 0.7094 0.7547 0.2760 0.6797 0.6551 0.1626 0.1190 0.9597 0.3404 0.5853 0.2238
0.7513 0.2551 0.5060 0.6991 0.8909 0.9593 0.5472 0.1386 0.1493 0.2575 0.8407 0.2543 0.8143 0.2435 0.9293 0.1966 0.2511 0.6160 0.4733
0.5657 0.9067 0.6285 0.4610 0.4719 0.3179 0.7359 0.8413 0.8094 0.7576 0.5139 0.7963 0.4188 0.9522 0.4337 0.6525 0.9185 0.6335 0.8369
0.5523 0.9039 0.6284 0.4615 0.4840 0.3324 0.7524 0.8480 0.8094 0.7398 0.5118 0.7818 0.4186 0.9521 0.4605 0.6541 0.9138 0.6240 0.8968
0.0135 0.0028 0.0001 −0.0005 −0.0121 −0.0145 −0.0165 −0.0067 0.0000 0.0177 0.0021 0.0145 0.0002 0.0001 −0.0268 −0.0016 0.0046 0.0094 −0.0599
5 Conclusions Having a set of alternative preferences, we usually assume that it is correctly assessed. The problem arises when new alternatives appear. Then they should be assessed in such a way, that they give equally effective and correct assessments of the preferential values. The solution to this problem is to use stochastic optimization methods. The Particle Swarm Optimization Algorithm method combined with the COMET method can be used to find preference values of characteristic objects. The way to do this is to find the minimal sum of the absolute differences of solutions obtained from COMET method and those that were defined at the beginning of system performance. We determined the space of the problem using two criteria C1 and C2 . Receiving the preferential values of the selected alternatives, they need to be verified, to check if they were calculated correctly. In this study, we use Spearman’s correlation coefficient, which equals 0.7098 for the first attempt using 20 particles and 0.9865 for the second attempt using 30 particles. This difference shows that greater amount of particles has positively influenced the accuracy achieved for the determined problem.
362
J. Wie˛ckowski et al.
Table 4 Testing set and preferences (P obtained; Pref reference) using 30 particles Ai C1 C2 Pref P Diff 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.8147 0.9058 0.1270 0.9134 0.6324 0.0975 0.2785 0.5469 0.9575 0.9649 0.1576 0.9706 0.9572 0.4854 0.8003 0.1419 0.4218 0.9157 0.7922 0.9595
0.6557 0.0357 0.8491 0.9340 0.6787 0.7577 0.7431 0.3922 0.6555 0.1712 0.7060 0.0318 0.2769 0.0462 0.0971 0.8235 0.6948 0.3171 0.9502 0.0344
0.4882 0.7120 0.5257 0.1410 0.5606 0.6235 0.6139 0.7953 0.3998 0.6551 0.6673 0.6696 0.6295 0.9161 0.7697 0.5522 0.6239 0.6412 0.1935 0.6770
0.4869 0.7500 0.5469 0.2473 0.5465 0.6574 0.6136 0.7961 0.4243 0.6618 0.6943 0.7165 0.6222 0.9665 0.7818 0.5706 0.6143 0.6280 0.2660 0.7214
0.0013 −0.0380 −0.0212 −0.1063 0.0141 −0.0339 0.0003 −0.0008 −0.0245 −0.0067 −0.0270 −0.0469 0.0073 −0.0503 −0.0121 −0.0185 0.0096 0.0132 −0.0725 −0.0444
For future directions, it is worth considering indicating the right number of particles for the Particle Swarm Optimization Algorithm method to maximize its results. Moreover, choosing a grid in an unknown problem to achieve an optimal solution also would be valuable.
Acknowledgements The work was supported by the National Science Centre, Decision No. DEC2016/23/N/HS4/01931.
References 1. Agrawal, S., Panigrahi, B.K., Tiwari, M.K.: Multiobjective particle swarm algorithm with fuzzy clustering for electrical power dispatch. IEEE Trans. Evol. Comput. 12(5), 529–541 (2008) 2. Boender, C.G.E., De Graan, J.G., Lootsma, F.A.: Multi-criteria decision analysis with fuzzy pairwise comparisons. Fuzzy Sets Syst. 29(2), 133–143 (1989) 3. Deschrijver, G., Kerre, E.E.: On the relationship between some extensions of fuzzy set theory. Fuzzy sets and systems 133(2), 227–235 (2003)
The Search of the Optimal Preference Values of the Characteristic …
363
4. Faizi, S., Rashid, T., Sałabun, W., Zafar, S., Watróbski, J.: Decision making with uncertainty using hesitant fuzzy sets. Int. J. Fuzzy Syst. 20(1), 93–103 (2018) 5. Faizi, S., Sałabun, W., Rashid, T., Watróbski, J., Zafar, S.: Group decision-making for hesitant fuzzy sets based on characteristic objects method. Symmetry 9(8), 136 (2017) 6. Guitouni, A., Martel, J.M.: Tentative guidelines to help choosing an appropriate MCDA method. Eur. J. Oper. Res. 109(2), 501–521 (1998) 7. Gupta, M.M., Qi, J.: Theory of T-norms and fuzzy inference methods. Fuzzy Sets Syst. 40(3), 431–450 (1991) 8. Łokietek, T., Jaszczak, S., Niko´nczuk, P.: Optimization of control system for modified configuration of a refrigeration unit. Procedia Comput. Sci. 159, 2522–2532 (2019) 9. Niko´nczuk, P.: Preliminary modeling of overspray particles sedimentation at heat recovery unit in spray booth. Eksploatacja i Niezawodno´sc´ 20, 387–393 (2018) 10. Piegat, A.: Fuzzy modeling and control (Studies in Fuzziness and Soft Computing). Physica 742 (2001) 11. Piegat, A., Sałabun, W.: Identification of a multicriteria decision-making model using the characteristic objects method. Appl. Comput. Intell. Soft Comput. (2014) 12. Roubens, M.: Fuzzy sets and decision analysis. Fuzzy Sets Syst. 90(2), 199–206 (1997) 13. Sałabun, W.: The characteristic objects method: a new distance-based approach to multicriteria decision-making problems. J. Multi-Criteria Decis. Anal. 22(1–2), 37–50 (2015) 14. Sałabun, W., Piegat, A.: Comparative analysis of MCDM methods for the assessment of mortality in patients with acute coronary syndrome. Artif. Intell. Rev. 48(4), 557–571 (2017) 15. Sałabun, W., Karczmarczyk, A., Watróbski, J., Jankowski, J.: Handling data uncertainty in decision making with COMET. In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1478–1484. IEEE (2018) 16. Schutte, J.F., Reinbolt, J.A., Fregly, B.J., Haftka, R.T., George, A.D.: Parallel global optimization with the particle swarm algorithm. Int. J. Numer. Methods Eng. 61(13), 2296–2315 (2004) 17. Watróbski, J., & Jankowski, J. Knowledge management in MCDA domain. In: 2015 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 1445–1450. IEEE (2015) 18. Watróbski, J., Jankowski, J., Ziemba, P., Karczmarczyk, A., Zioło, M.: Generalised framework for multi-criteria method selection. Omega 86, 107–124 (2019) 19. Wa˛tróbski, J., Sałabun, W.: The characteristic objects method: a new intelligent decision support tool for sustainable manufacturing. In: International Conference on Sustainable Design and Manufacturing, pp. 349–359. Springer, Cham (2016) 20. Watróbski, J., Sałabun, W.: Green supplier selection framework based on multi-criteria decision-analysis approach. In: International Conference on Sustainable Design and Manufacturing, pp. 361–371. Springer, Cham (2016) 21. Watróbski, J., Sałabun, W., Karczmarczyk, A., Wolski, W.: Sustainable decision-making using the COMET method: an empirical study of the ammonium nitrate transport management. In: 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 949–958. IEEE (2017) 22. Xinchao, Z.: A perturbed particle swarm algorithm for numerical optimization. Appl. Soft Comput. 10(1), 119–124 (2010) 23. Zadeh, L.A.: Fuzzy sets. Inf. Control 8(3), 338–353 (1965) 24. Zimmermann, H. J.: Fuzzy Set Theory and Its Applications. Springer Science & Business Media (2011)
Finding an Approximate Global Optimum of Characteristic Objects Preferences by Using Simulated Annealing Jakub Wi˛eckowski, Bartłomiej Kizielewicz, and Joanna Kołodziejczyk
Abstract Random processes are increasingly becoming a topic of consideration in many areas where decision-making is an important factor. The random factor affects the difficulty of determining input parameters. The selection of these parameters can be a key element in achieving the correct results. Stochastic optimization methods can be used to solve this problem. In this article, the simulated annealing method was used to obtain an optimal solution, which then, in combination with the COMET method, provided satisfactory results by determining the relationship between the preferences of the initial alternatives and newly identified alternatives. The purpose of this study was to systematize the knowledge of effective selection of input parameters for stochastic methods. The obtained solution indicates how to select a grid to an unknown problem and how to select a step in the simulated annealing method to achieve more precise results.
1 Introduction Random processes are one of the problems that is often considered today. The result is based on a dependency described as the result of a random phenomenon. Their randomness makes it difficult to determine how to effectively select algorithm parameters in order to obtain the desired outcomes. The results are influenced by the selection of the method to the presented problem, determination of initial values, or the quality of the random number generator. It is also important to determine the purpose to which the subsequent solutions are to lead and the boundaries of error with which the designed algorithm can work.
J. Wi˛eckowski (B) · B. Kizielewicz · J. Kołodziejczyk Research Team on Intelligent Decision Support Systems, Faculty of Computer Science and Information Technology, Department of Artificial Intelligence and Applied Mathematics, West ˙ Pomeranian University of Technology in Szczecin, Ul. Zołnierska 49, 71-210 Szczecin, Poland e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_31
365
366
J. Wie˛ckowski et al.
Stochastic optimization methods deal with the issue of random processes. Their performance can be presented by using two main stages, where the first one involves the selection of a single initial state, and the second one is the process of generating adjacent states. Unlike deterministic methods, the output in the stochastic model gives different results while maintaining the same input values. Simulated annealing is a method belonging to the group of stochastic methods [12, 13]. Its operation will be presented in the following article. Simulated annealing method is used mainly in discrete problems, and the algorithm itself simulates the process of cooling down metal crystals in metallurgy. In metallurgy, this involves heating and cooling the metal under controlled conditions in order to obtain specific characteristics. Regarding the functioning of the algorithm, each state of the problem is considered as a physical state of the system. The objective is to bring the system from the selected initial state to a state with minimal energy. The simulated annealing method is successfully used in many situations, such as quantitative studies [9, 10], building genetic algorithms [1, 3], and job shop scheduling [11]. Many studies focus on the right choice of decision support method [7, 21, 22]. Many Multi-Criteria Decision-Making (MCDM) problems can be solved successfully with the usage of the Characteristic Objects Method (COMET) [19, 23–25]. This approach allows a single decision-maker to create a model, where pairwise comparison of characteristic objects is used to create a ranking of alternatives between individual criteria. Moreover, the pairwise comparison of the characteristic objects is much easier than if the comparison was to take place between decisional variants. Furthermore, the advantage of this method is that it is completely free of ranking reversal as the final ranking is constructed based on fuzzy rules and characteristic objects. The hesitant fuzzy sets extensions of the COMET method has been developed for solving the problems in the uncertain environment [5, 6, 18]. In this study, we combined the usage of COMET method with simulated annealing in order to achieve the desired model. The COMET method ensures solving the problem under current uncertainty, when simulated annealing proves that with precisely selected parameters, this solution can be effective and gives satisfactory results. The rest of the paper is organized as follows. Section 2 presents the introduction to the fuzzy logic theory. Next, Sect. 3 is devoted to basic information of COMET method and the explanation of its working. In Sect. 4, a description and the presentation of the simulated annealing method is provided. In Sect. 5, we present an illustrative example of the usage of simulated annealing within the model with two selected criteria given. Last Sect. 6 contains summary and conclusions of conducted studies.
Finding an Approximate Global Optimum of Characteristic …
367
2 Fuzzy Logic—Preliminaries Fuzzy Set Theory and their extensions becomes more and more important in model creation in various scientific fields [8], where the considered issue involves solving multi-criteria decision problems [2, 4, 16]. The concepts of Fuzzy Set Theory were introduced by Lofti Zadeh [26], and some of them are described below: The Fuzzy Set and the Membership Function—the characteristic function μ A of a crisp set A ⊆ X assigns a value of either 0 or 1 to each member of X , as well as the crisp sets only allow a full membership (μ A (x) = 1) or no membership at all (μ A (x) = 0). This function can be generalized to a function μ A˜ so that the value assigned to the element of the universal set X falls within a specified range, i.e., μ A˜ : X → [0, 1]. The assigned value indicates the degree of membership of the element in the set A. The function μ A˜ is called a membership function and the set A˜ = (x, μ A˜ (x)), where x ∈ X , defined by μ A˜ (x) for each x ∈ X is called a fuzzy set [27]. ˜ defined on the universal The Triangular Fuzzy Number (TFN)—a fuzzy set A, ˜ m, b) if its set of real numbers , is told to be a triangular fuzzy number A(a, membership function has the following form (1): ⎧ 0 ⎪ ⎪ ⎪ x−a ⎪ ⎨ m−a μ A˜ (x, a, m, b) = 1 ⎪ b−x ⎪ ⎪ ⎪ ⎩ b−m 0
x ≤a a≤x ≤m x =m m≤x ≤b x ≥b
(1)
and the following characteristics (2), (3): x1 , x2 ∈ [a, b] ∧ x2 > x1 ⇒ μ A˜ (x2 ) > μ A˜ (x1 )
(2)
x1 , x2 ∈ [b, c] ∧ x2 > x1 ⇒ μ A˜ (x2 ) > μ A˜ (x1 )
(3)
The Support of a TFN—the support of a TFN A˜ is defined as a crisp subset of the A˜ set in which all elements have a non-zero membership value in the A˜ set (4): ˜ = x : μ A˜ (x) > 0 = [a, b] S( A)
(4)
The Core of a TFN—the core of a TFN A˜ is a singleton (one-element fuzzy set) with the membership value equal to 1 (5): ˜ = x : μ A˜ (x) = 1 = m C( A)
(5)
The Fuzzy Rule—the single fuzzy rule can be based on the Modus Ponens tautology [14]. The reasoning process uses the I F—T H E N , O R and AN D logical connectives.
368
J. Wie˛ckowski et al.
3 The Characteristic Objects Method The COMET method main advantage is that it is completely free of the Rank Reversal phenomenon. Previous works have verified the accuracy of the COMET method. The formal notation of the COMET method should be shortly recalled [15, 17, 20]. Step 1. Define the Space of the Problem—the expert determines the dimensionality of the problem by selecting the number r of criteria, C1 , C2 , ..., Cr . Then, the set of fuzzy numbers for each criterion Ci is selected (6): Cr = {C˜ r 1 , C˜ r 2 , ..., C˜ r cr }
(6)
where c1 , c2 , ..., cr are numbers of the fuzzy numbers for all criteria. Step 2. Generate Characteristic Objects—The characteristic objects (C O) are obtained by using the Cartesian Product of fuzzy numbers cores for each criteria as follows (7): (7) C O = C(C1 ) × C(C2 ) × · · · × C(Cr ) Step 3. Rank the Characteristic Objects—the expert determines the Matrix of Expert Judgment (M E J ). It is a result of pairwise comparison of the COs by the problem expert. The M E J matrix contains results of comparing characteristic objects by the expert, where αi j is the result of comparing C Oi and C O j by the expert. The function f exp denotes the mental function of the expert. It depends solely on the knowledge of the expert and can be presented as (8). Afterwards, the vertical vector of the Summed Judgments (S J ) is obtained as follows (9). ⎧ ⎨ 0.0, f exp (C Oi ) < f exp (C O j ) αi j = 0.5, f exp (C Oi ) = f exp (C O j ) ⎩ 1.0, f exp (C Oi ) > f exp (C O j ) S Ji =
t
αi j
(8)
(9)
j=1
Finally, values of preference are approximated for each characteristic object. As a result, the vertical vector P is obtained, where i-th row contains the approximate value of preference for C Oi . Step 4. The Rule Base—each characteristic object and value of preference is converted to a fuzzy rule as follows (10): I F C(C˜ 1i ) AN D C(C˜ 2i ) AN D ... T H E N Pi In this way, the complete fuzzy rule base is obtained.
(10)
Finding an Approximate Global Optimum of Characteristic … Fig. 1 The simulated annealing algorithm
369
Get initial solution
Generate new solution
False
Accept new solution
False
True
Update stored values
Adjust temperature
Stop criterion satisfied
True
End
Step 5. Inference and Final Ranking—each alternative is presented as a set of crisp numbers (e.g., Ai = {a1i , a2i , ..., ari }). This set corresponds to criteria C1 , C2 , ..., Cr . Mamdani’s fuzzy inference method is used to compute preference of i-th alternative. The rule base guarantees that the obtained results are unequivocal. The bijection makes the COMET completely rank reversal free.
4 Simulated Annealing Method The simulated annealing method is one of the methods of stochastic optimization, which involves looking for the best solution in a given solution area [3]. It is mainly used for problems considered in discrete spaces. The origin of this method was inspired by the phenomenon of annealing in metallurgy, where cooling and heating is controlled during metal processing. This process is performed in order to increase the size of metal particles while reducing their defects. Cooling the metal affects the slow reduction of temperature, which is supposed to reduce the possibility of accepting worse solutions. The simulated annealing algorithm in each step selects a
370
J. Wie˛ckowski et al.
Fig. 2 Example of a determined decision-making function and random selected training set of alternatives
state with characteristics similar to the current state, and the iteration process ends when the metal temperature decreases to zero. The main advantage of this method is that it avoids the algorithm getting stuck in local optima and this is done by acceptance of inferior solutions [9]. The cost of obtaining an accurate solution is the relatively long duration of the algorithm calculation. Additionally, the accuracy of the algorithm can be affected by small changes in the algorithm input parameters. The key parameters are the initial temperature entered and the specific problem area [10]. The flowchart of this algorithm is presented in Fig. 1.
5 Study Case This paper presents a model that evaluates a set of alternatives based on probability. According to the preference values of characteristic objects, we look for the values giving the smallest sum of errors calculated as the sum of absolute differences between the determined input preferences and those calculated using the COMET method. The first stage is to define the space of the problem under consideration based on two criteria C1 and C2 . The obtained space will represent the area of operation of the presented methods. It is shown in Fig. 2. The next step is to select points of characteristic objects and define starting alternatives. They are presented as a vector of preferential values, which will then be compared with the drawn alternatives. Figure 2 shows defined starting alternatives of characteristic objects. Then, using the simulated annealing method, preference values for characteristic objects were looked for. The rule base identified as a result of this process is presented as (11). Figure 3 shows the operation of the target function for the given problem. The last stage is the selection of test points of characteristic objects to check the correctness of identified preference values through previous steps of system operation. COMET
Finding an Approximate Global Optimum of Characteristic …
371
Table 1 Training set of alternatives and their preferences (P obtained; Pr e f reference) Ai
C1
C2
Pref
P
Diff
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.6557 0.0357 0.8491 0.9340 0.6787 0.7577 0.7431 0.3922 0.6555 0.1712 0.7060 0.0318 0.2769 0.0462 0.0971 0.8235 0.6948 0.3171 0.9502 0.0344
0.4387 0.3816 0.7655 0.7952 0.1869 0.4898 0.4456 0.6463 0.7094 0.7547 0.2760 0.6797 0.6551 0.1626 0.1190 0.4984 0.9597 0.3404 0.5853 0.2238
0.2518 0.1095 0.6197 0.6923 0.1414 0.3235 0.2870 0.3517 0.4849 0.4345 0.1817 0.3467 0.3410 0.0204 0.0130 0.3558 0.8115 0.1120 0.4827 0.0379
0.2448 0.1054 0.6324 0.6923 0.1413 0.3145 0.2870 0.3517 0.5013 0.4345 0.1924 0.3292 0.3429 0.0454 0.0348 0.3490 0.7866 0.1196 0.4921 0.0618
0.0070 0.0041 −0.0127 0.0000 0.0001 0.0090 0.0000 0.0000 −0.0165 0.0000 −0.0106 0.0176 −0.0019 −0.0250 −0.0219 0.0068 0.0248 −0.0076 −0.0094 −0.0239
method was used to assess preferences of selected alternatives, and selected points of characteristic objects are presented in Fig. 4. R1 : R2 : R3 : R4 : R5 : R6 : R7 : R8 : R9 :
IF IF IF IF IF IF IF IF IF
C1 C1 C1 C1 C1 C1 C1 C1 C1
∼ 0.0 ∼ 0.0 ∼ 0.0 ∼ 0.5 ∼ 0.5 ∼ 0.5 ∼ 1.0 ∼ 1.0 ∼ 1.0
AN D C2 AN D C2 AN D C2 AN D C2 AN D C2 AN D C2 AN D C2 AN D C2 AN D C2
∼ 0.0 T H E N ∼ 0.5 T H E N ∼ 1.0 T H E N ∼ 0.0 T H E N ∼ 0.5 T H E N ∼ 1.0 T H E N ∼ 0.0 T H E N ∼ 0.5 T H E N ∼ 1.0 T H E N
0.0000 0.1334 0.6633 0.0001 0.2000 0.7778 0.1588 0.4315 0.9122
(11)
The final preferences were calculated and compared with those obtained in the initial stage of the system performance using the Spearman’s correlation coefficient. For the results obtained this coefficient equals 0.9940. In addition, the target function for simulated annealing is shown in Fig. 3. It found optimal solution before reaching 2000 iterations. Subsequent iterations did not bring results to improve the preference values of alternatives. The set of training alternatives is shown in Table 1, and
372
J. Wie˛ckowski et al.
Table 2 Testing set of alternatives and their preferences (P obtained; Pr e f reference) Ai
C1
C2
Pr e f
P
Diff
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0.7513 0.2551 0.5060 0.6991 0.8909 0.9593 0.5472 0.1386 0.1493 0.2575 0.8407 0.2543 0.8143 0.2435 0.9293 0.3500 0.1966 0.2511 0.6160 0.4733
0.3517 0.8308 0.5853 0.5497 0.9172 0.2858 0.7572 0.7537 0.3804 0.5678 0.0759 0.0540 0.5308 0.7792 0.9340 0.1299 0.5688 0.4694 0.0119 0.3371
0.2339 0.5339 0.3209 0.3488 0.8294 0.2913 0.5049 0.4309 0.1141 0.2584 0.1810 0.0184 0.3771 0.4702 0.8702 0.0433 0.2523 0.1810 0.0950 0.1412
0.2462 0.5341 0.3012 0.3458 0.7998 0.2984 0.5144 0.4275 0.1166 0.2429 0.1461 0.0181 0.3774 0.4748 0.8279 0.0468 0.2351 0.1566 0.0421 0.1325
−0.0123 −0.0002 0.0198 0.0030 0.0296 −0.0070 −0.0095 0.0034 −0.0025 0.0155 0.0349 0.0002 −0.0003 −0.0046 0.0422 −0.0035 0.0172 0.0244 0.0529 0.0088
Fig. 3 Diagram of matching the goal function
the set of testing alternatives is shown in Table 2. Tables contain information about assessment value from the COMET method, starting preferential value, preferential value obtained from the system, and the difference between these values. As a result of system application, we achieved identified space, which is presented in Fig. 4.
Finding an Approximate Global Optimum of Characteristic …
373
Fig. 4 Example of a confidential decision-making function and random selected testing set of alternatives
6 Conclusions By having data from existing alternatives and evaluations of the value of their preferences, it can be assumed that they are set correctly. However, the new alternatives may cause problems in defining their preference values in an effective and efficient way. Stochastic optimization methods are these ones which can be used to solve this problem. The simulated annealing method in combination with the COMET method made it possible to find optimal preference values for characteristic objects. The criteria for finding the optimal solution were to obtain the lowest value of the sum of values from the absolute differences between the obtained preferences and those defined at the beginning. The space problem was defined on the basis of two criteria C1 and C2 . By obtaining preference values for testing alternatives, using Spearman’s correlation coefficient, we verified whether they were determined correctly. The obtained coefficient was 0.9940, which confirms the correctness of the results. For future directions, it is worth considering the choice of a grid in an unknown problem, as this is an important part of getting the correct solution. An inappropriate choice of grid can cause the algorithm to get lost in local optima. In addition, the indication of the correct step for simulated annealing for more accurate results should also be taken into account. Acknowledgements The work was supported by the National Science Centre, Decision No. DEC2016/23/N/HS4/01931.
374
J. Wie˛ckowski et al.
References 1. Adler, D.: Genetic algorithms and simulated annealing: a marriage proposal. In: IEEE International Conference on Neural Networks, pp. 1104–1109. IEEE (1993) 2. Boender, C.G.E., De Graan, J.G., Lootsma, F.A.: Multi-criteria decision analysis with fuzzy pairwise comparisons. Fuzzy Sets Syst. 29(2), 133–143 (1989) 3. Davis, L.: Genetic algorithms and simulated annealing (1987) 4. Deschrijver, G., Kerre, E.E.: On the relationship between some extensions of fuzzy set theory. Fuzzy Sets Syst. 133(2), 227–235 (2003) 5. Faizi, S., Sałabun, W., Rashid, T., Watróbski, J., Zafar, S.: Group decision-making for hesitant fuzzy sets based on characteristic objects method. Symmetry 9(8), 136 (2017) 6. Faizi, S., Rashid, T., Sałabun, W., Zafar, S., Watróbski, J.: Decision making with uncertainty using hesitant fuzzy sets. Int. J. Fuzzy Syst. 20(1), 93–103 (2018) 7. Guitouni, A., Martel, J.M.: Tentative guidelines to help choosing an appropriate MCDA method. Eur. J. Oper. Res. 109(2), 501–521 (1998) 8. Gupta, M.M., Qi, J.: Theory of T-norms and fuzzy inference methods. Fuzzy Sets Syst. 40(3), 431–450 (1991) 9. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983) 10. Kirkpatrick, S.: Optimization by simulated annealing: quantitative studies. J. Stat. Phys. 34(5– 6), 975–986 (1984) 11. Van Laarhoven, P.J., Aarts, E.H., Lenstra, J.K.: Job shop scheduling by simulated annealing. Oper. Res. 40(1), 113–125 (1992) 12. Łokietek, T., Jaszczak, S., Niko´nczuk, P.: Optimization of control system for modified configuration of a refrigeration unit. Procedia Comput. Sci. 159, 2522–2532 (2019) 13. Niko´nczuk, P.: Preliminary modeling of overspray particles sedimentation at heat recovery unit in spray booth. Eksploatacja i Niezawodno´sc´ 20, 387–393 (2018) 14. Piegat, A.: Fuzzy Modeling and Control (Studies in Fuzziness and Soft Computing). Physica 742 (2001) 15. Piegat, A., Sałabun, W.: Nonlinearity of human multi-criteria in decision-making. J. Theor. Appl. Comput. Sci. 6(3), 36–49 (2012) 16. Roubens, M.: Fuzzy sets and decision analysis. Fuzzy Sets Syst. 90(2), 199–206 (1997) 17. Sałabun, W.: The characteristic objects method: a new distance-based approach to multicriteria decision-making problems. J. Multi-Criteria Decis. Anal. 22(1–2), 37–50 (2015) 18. Sałabun, W., Karczmarczyk, A., Watróbski, J.: Decision-making using the hesitant fuzzy sets COMET method: an empirical study of the electric city buses selection. In: 2018 IEEE Symposium Series on Computational Intelligence (SSCI), pp. 1485–1492. IEEE (2018) 19. Sałabun, W., Palczewski, K., Watróbski, J.: Multicriteria approach to sustainable transport evaluation under incomplete knowledge: electric bikes case study. Sustainability 11(12), 3314 (2019) 20. Sałabun, W., Piegat, A.: Comparative analysis of MCDM methods for the assessment of mortality in patients with acute coronary syndrome. Artif. Intell. Rev. 48(4), 557–571 (2017) 21. Watróbski, J., Jankowski, J.: Knowledge management in MCDA domain. In: 2015 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 1445–1450. IEEE (2015) 22. Watróbski, J., Jankowski, J., Ziemba, P., Karczmarczyk, A., Zioło, M.: Generalised framework for multi-criteria method selection. Omega 86, 107–124 (2019) 23. Wa˛tróbski, J., Sałabun, W.: The characteristic objects method: a new intelligent decision support tool for sustainable manufacturing. In: International Conference on Sustainable Design and Manufacturing, pp. 349–359. Springer, Cham (2016) 24. Watróbski, J., Sałabun, W.: Green supplier selection framework based on multi-criteria decision-analysis approach. In: International Conference on Sustainable Design and Manufacturing, pp. 361–371. Springer, Cham (2016)
Finding an Approximate Global Optimum of Characteristic …
375
25. Watróbski, J., Sałabun, W., Karczmarczyk, A., Wolski, W.: Sustainable decision-making using the COMET method: an empirical study of the ammonium nitrate transport management. In: 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 949–958. IEEE (2017) 26. Zadeh, L.A.: Information and control. Fuzzy Sets 8(3), 338–353 (1965) 27. Zimmermann, H.J.: Fuzzy Set Theory and its Applications. Springer Science & Business Media (2011)
Large-Scale Systems for Intelligent Decision-Making and Knowledge Engineering
Digital Systems for eCTD Creation on the Pharmaceutical Market of the Eurasian Economic Union Konstantin Koshechkin , Georgy Lebedev , and Sergey Zykov
Abstract Pharmaceutical market integration is highly dependent on digital technologies in general and the Internet in particular. The work aims are to study the available software for creating Electronic common technical documents and the possibility of its application on the Eurasian Economic Union market. To obtain information, the survey was performed based on employees of pharmaceutical companies, which were interviewed to identify the software they are using or plan to use. Based on the obtained results, a list of preferred software was compiled. It contains 9 software products. As part of the second phase of the study, a survey of developers of designated software solutions was conducted. Results showed that for pharmaceutical companies operating in the Eurasian Economic Union region, the issue of software readiness for working with the requirements of domestic regulators is of particular importance. Most foreign software products can be localized only after significant modifications. Domestic software solutions are just beginning to appear and in some cases are highly specialized. For example, programs for planning and meeting regulatory deadlines in the market are represented by single products.
K. Koshechkin (B) · G. Lebedev I.M. Sechenov First Moscow State Medical University, 2-4 Bolshaya Pirogovskaya street, 119991 Moscow, Russia e-mail: [email protected] K. Koshechkin Scientific Centre for Expert Evaluation of Medicinal Products of the Ministry of Health of the Russian Federation, Petrovsky boulevard 8, bld., Moscow, Russia G. Lebedev Federal Research Institute for Health Organization and Informatics, 11, Dobrolubova street, 127254 Moscow, Russia S. Zykov National Research University Higher School of Economics, 20 Myasnitskaya Ulitsa, Moscow 101000, Russia © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_32
379
380
K. Koshechkin et al.
1 Introduction In the modern economy, the most valuable commodities are scientific technology and high-tech products. Modern drugs show significantly higher efficacy relative to drugs admitted to the market 20 or more years ago. At the same time, medicines that are presented on the markets of the countries of the Eurasian Economic Union (EAEU) are in some cases already excluded from circulation in Western countries. This is largely due to the difference in the level of social well-being and the ability to provide consumers with only the most modern drugs with maximum effectiveness and minimal side effects. The presence in the circulation of less effective but substantially cheaper drugs creates significant differences in the cost terms of treatment of patients in developing countries and countries with developed economies. The other side of this issue is that in the countries of Western Europe and USA new drugs, which are currently not available on the EAEU pharmaceutical market, are registered and are being applied. In some cases, the treatment of a disease is impossible without a foreign drug, which leads to the need for individual purchase of a drug that is not officially available on the market. This procedure is possible only in situations threatening the life and health of the patient and is carried out on the basis of a decision of the medical commission. The social and political importance of preserving the productive potential for the drug supply of the country’s population within its territory is also important. Given the experience of the twentieth century in the creation and use of biological weapons, it is impossible to exclude the possibility of this kind of aggression of one state against another. Measures to protect against this type of threat also are being carried out abroad [1]. If the drug supply is fully assigned to a more efficient state in the production of medicines, the situation could be catastrophic in the case of biological aggression or a natural epidemic and, at the same time, an embargo on the supply of drugs necessary for protection or treatment was carried out. Thus, the integration of the pharmaceutical market of the Eurasian Economic Union has several goals that must be fulfilled in parallel without harming each other. First of all, this is an audit of the entire list of medicines registered in the EAEU countries. The procedure for bringing the dossier of medicines into compliance with the requirements of the EAEU requires the provision of evidence of the effectiveness, safety, and quality of medicines in accordance with the requirements relevant to the current level of scientific knowledge. Secondly, it is the harmonization of dossier requirements with the Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use (ICH) recognized by the European Medicines Agency (EMA, Europe) and Food and Drug Administration (FDA, USA). As a result, it will be much easier for manufacturers of new highly effective drugs to register them on the EAEU market. Also, the attractiveness of this market will increase due to the significant conjunction of drug consumers within the framework of the union state relative to each of its members individually. The third goal is to maintain the ability to develop the maximum number of industries in the territories of the EAEU member states.
Digital Systems for eCTD Creation on the Pharmaceutical …
381
Pharmaceutical market integration is highly dependent on digital technologies in general and the Internet in particular. Due to the territorial remoteness of the regulatory authorities of the EAEU member countries, information exchange issues are of paramount importance. Business processes of the EAEU are built on the principles of electronic information exchange. The integrated information system of the Eurasian Economic Commission (EEC) describes and standardizes the rules for the interaction of participants in an integrated market. Unlike previous periods of human development, the digital era for information exchange does not require physical movement of objects. Digital information can be transmitted via the Internet instantly in comparison to other methods of data transfer. In order to exchange documents in electronic form without the participation of a person in the recognition of the text of these documents, it is necessary to compile all documents in a formalized form, i.e., in the format that the software of all participants in the data exchange can use. Preparation of a dossier for submission to the market authorization procedure consists in the formation of a package of required documents in electronic form in electronic common technical document (eCTD) format or other standards used by the regulatory authorities of the country of submission. The effective management and validation of relevant requirements of electronic dossiers is a complex process requiring the use of specialized software. The aim of the work is to study the software for creating eCTD and its possibilities for application on the EAEU market.
2 Methods A preliminary analysis of the publications showed the lack of public access to scientific publications investigating the applicability of information systems for the formation of eCTD in accordance with the requirements of the EAEU. To obtain the missing information, an analysis was carried out in 2 directions. At first employees of pharmaceutical companies were interviewed to identify the software they are using or plan to use. 1179 foreign pharmaceutical companies and 859 companies that are located on the territory of the EAEU were identified whose medicines are registered on the territory of the EAEU. Respondents were selected from pharmaceutical companies as part of the Education Center of the Federal State Budgetary Institution “Scientific Center for Expert Evaluation of Medicinal Products” of the Ministry of Health of the Russian Federation. In total, 93 (5%) representatives of the companies took part in the study. Of these, 36 (39%) are representatives of foreign pharmaceutical companies and 57 (61%) are representatives of the companies located on the territory of the EAEU. All respondents are specialists in drug market authorization and have a higher medical, pharmaceutical, or chemical education. The study examined the issues of what software the interviewed employees use or plan to use to create dossiers in eCTD format, what software features are used by them, how their organizations implement eCTD storage, what budget is allocated in their organization
382
K. Koshechkin et al.
for the implementation of the software suitable to create eCTD, how many drugs are authorized for market by their organizations in the EAEU countries, and how many employees in their companies are involved in drug market authorization process. Based on the obtained results, a list of preferred software was compiled. It contains 9 software products. As part of the second phase of the study, a survey of companies developing designated software solutions was conducted. The study examined the questions of whether their software product is ready to form eCTD in accordance with the requirements of the EAEU, what method of data storage is used in the system, and how many dossiers were accepted by regulatory authorities in the eCTD format.
3 Results Of the 93 respondents, 47 (50.5%) did not choose which software they would use to create eCTD, 43 companies (46.3%) already use software, and 3 (3.2%) answered that they did not need software to create eCTD. Regardless of whether the software is already in use or just planned to be used, the results of the study were collected. The results are presented in Table 1. Regarding software features, multiple choices were given. 144 results have been received. The most demanded functional capabilities were the formation of dossiers in eCTD format, which was reported by 56 respondents (39%), the conversion of dossier documents into PDF format was reported by 45 respondents (31%). Also, in 20% of cases (29 respondents), the importance of combining software for forming dossiers and the functions of regulatory information management systems, for example, planning regulatory actions, was noted. The least interest at the moment was in the possibility of forming documents by templates (only 14 responses—10% received). Regarding data storage, the majority of respondents are inclined to use servers operated by their own company—60 answers (66%) or even local workplaces—24 companies (26%). The greatest distrust is in the use of cloud technology. None of the respondents planned to create a dedicated private cloud. And only 7 companies (8%) are ready to entrust their data with cloud storage. Often, one of the most sensitive issues in implementing information technology solutions in developing countries is the allocation of a financial budget. In our study, we got the result that in 36 companies (41% of the respondents) the budget was not yet allocated. Of those where the budget was already allocated, most often it is relatively high, more than 50 thousand US dollars per year. This was noted by 19 respondents (22% of the total number of respondents). Average budgets in the range of 1.5–8 thousand US dollars per year and from 8 to 50 thousand US dollars per year were identified equally in 13 respondents in each case (15%). A low budget of less than 1.5 thousand US dollars per year is indicated by 6 companies (7%). A correlation was also found between the number of employees and the number of registered drugs. The more the dossiers in the company are required for handling in
Digital Systems for eCTD Creation on the Pharmaceutical … Table 1 The results of a survey of pharmaceutical companies’ employees on the use of eCTD creation software
383
What software features do you use? Formation of dossiers by sections
39%
Regulatory planning
20%
56 29
Creating documents by template
10%
14
Editable documents to PDF convertion
31%
45
100%
144
How does your organization implement eCTD storage? Local workplaces
26%
24
Organization’s own server
66%
60
Private Cloud
0%
0
Cloud storage
8%
7
100%
91
What budget is allocated in your organization for the formation of eCTD software (thousand US dollars per year)? Not allocated
41%
36
Less than 1.5
7%
6
From 1.5 to 8
15%
13
From 8 to 50
15%
13
More than 50
22%
19
100%
87
How many drug market authorizations are owned by your organization in the EAEU countries? 1–5
17%
15
5–20
49%
44
20–100
27%
24
More than 100
8%
7
100%
90
How many employees are involved in the drug market authorization process in the EAEU? 1–2
34%
30
2–5
33%
29
5–10
20%
18
More than 10
13%
12
100%
89
eCTD format and the more the employees are involved in this work, the higher the allocated budget. Regarding the list of software used or planned to be used on the EAEU market, 9 software products were identified. The list of the software is presented in Table 2.
384
K. Koshechkin et al.
Table 2 List of answers about software products used or planned for use in the EAEU countries for the creation of eCTD Software used by foreign companies Not using yet
42%
15
EXTEDO eCTDmanager, Germany
19%
7
LEKSOFT-ONLINE: REGISTRATION, Russia
11%
4
ARS Dossier Composer, Russia
8%
3
eCTD Office, Croatia
6%
2
LORENZ docuBridge, Germany
6%
2
Amplexor, Luxembourg
3%
1
IPHARMA iDossier, Russia
3%
1
Own application
3%
1
100%
36
Not using yet
56%
32
LEKSOFT-ONLINE: REGISTRATION, Russia
16%
9
EXTEDO eCTDmanager, Germany
7%
4
ARS Dossier Composer, Russia
5%
3
Vialek: CTD, Russia
5%
3
Not required
5%
3
IPHARMA iDossier, Russia
2%
1
LORENZ docuBridge, Germany
2%
1
Statandocs, Russia
2%
1
100%
57
Software used by EAEU domestic companies
From the list, it can be seen that there are 4 foreign developed products, and 5 software products are developed in the countries of the EAEU region. Among the companies that have already decided which software they will use to create an eCTD dossier, there is a pattern expressed in the territorial location of the pharmaceutical company and the software developer company. The most common system among foreign pharmaceutical companies is the EXTEDO eCTDmanager system (Germany)—indicated by 7 respondents (19%). Among the pharmaceutical companies in the EAEU region, the LEKSOFT-ONLINE: REGISTRATION (Russia) system is most prevalent—indicated by 9 respondents (16%). The most important feature that software developers were asked by the market is to create a dossier in the eCTD format. In order to assess the suitability of a software product for the EAEU market, confirmation should be obtained as a result of the validation of the generated dossier. Since the procedure for integrating the pharmaceutical markets of the EAEU countries has just begun, only 4 software development companies out of 9 respondents were able to answer in the affirmative. A total of 45 files were validated in the EAEU for eCTD format compliance, which
Digital Systems for eCTD Creation on the Pharmaceutical …
385
were mainly created in the LEKSOFT-ONLINE: REGISTRATION system—30 files (66%). Also, from 1 to 3 dossiers passed validation after the creation in software products by Vialek, Statandocs, and ARS. It should be noted that all these systems are developed in the EAEU region.
4 Discussion About 14,000 drugs are registered in the EAEU. To be able to circulate on the EAEU common market after December 31 of 2025, they must be aligned with the Union Rules [2]. According to a rough estimate, it will be 1,500–2,000 dossiers per year. Until December 31 of 2020, market authorization is possible according to the national procedure. On average, 800 drugs are market authorized annually according to the national procedure. An increase in the number of market authorizations according to the national procedure is expected due to the uncertainty in the technical approach in receiving documents from regulatory authorities in compliance with EAEU rules. Until December 31 of 2025, changes in previously received dossier will be made by the national procedure as well as confirmation of market authorization of registered drugs will be carried out. Thus, the load on regulatory mechanisms in the field of drug circulation will increase significantly. The national procedure will take place simultaneously with the EAEU procedure. The only solution to successfully overcome this challenge is the introduction of automated information processing systems for all participants in regulatory procedures and their integration among themselves. The use of digital systems allows pharmaceutical companies to reduce the cost of maintaining information in the actual state. A single source of reliable information appears to be the one, from which authorized users can prepare samples in the required form to solve their problems. Using a ready-made solution allows companies not to create internal competence in the company for the application of Identification of Medicinal Products (IDMP) standards. When describing drugs, dictionaries of terms that are approved by regulatory authorities should be used. Software products provide information and maintain the relevance of these directories. An unlimited number of users located in different parts of the world can work simultaneously in the system, and an unlimited number of regulatory and pre-regulatory projects can be conducted. Systems allow validation and quality control of information based on built-in algorithms. The systems themselves are validated by the developer and do not require validation by users, or it is minimized [3]. Digital systems make it possible to maintain dossier compliance with regulatory requirements, considering their constant changes. The most sophisticated systems that work with multinational markets support compliance depending on the region where the dossier is submitted. In this case, the speed of updating requirements implemented in the system is very important. For users, the most valuable is the
386
K. Koshechkin et al.
opportunity to validate and submit dossiers considering different regulatory requirements in the regulatory systems system and at the same time use the same, one, set of documents. A unified information storage system increases the integrity and reliability of data storage. The systems of this group can also be used to prepare paper versions of dossiers for regulators requiring paper-based documents to be provided. In this case, the system works as a catalog of printed documents, allowing users to simplify their preview, selection, and layout. Electronic copies are also created that can be evaluated during the dossier validation process. Like systems for identifying and describing medicines, systems for working with dossiers based on cloud technologies allow several employees to work with documents in parallel, despite their location in different divisions of the company or even in different countries. Webbased software as service solutions are characterized by low hardware requirements, do not require setting up and installing workplaces, and validation is maintained by solution providers. The procedure for launching pharmaceutical products on the market simultaneously in several countries determines the requirements for maintaining several subsidiary dossiers that correspond to the main one and at the same time satisfy the requirements of national regulators. The creation of separate dossiers “from scratch” for each region leads to unjustified additional costs. Most of the documents in the dossier are universal. Differences in the life cycle of the drug in different countries create additional complexity. The frequency of replacement, updating, and editing of dossier documents may vary significantly. Also, many companies keep versions of dossiers for internal work, which also differ from those directed to regulatory procedures. Digital systems for the creation of electronic files allow users to create and effectively manage an unlimited number of child files made on the basis of the main set of documents. They allow for group updates of documents in the sets of dossiers if the update is not specific and affects not only the main dossier but also child ones. Also, it is safe to store both open and closed parts of documents, guaranteeing their reliable separation. Due to the automation of work, digital systems reduce the number of errors associated with the human factor, and, as a result, reduce costs and accelerate the process of controlling regulatory actions. The tools for generating structured reports on preclinical and clinical trials can be used both as an independent tool and as a component of the eCTD dossier formation system. This software requires the ability to process both paper and electronic source documents. The use of such a system allows reducing the costs of manual processing of documents, to improve the quality of research design. The implementation of risk control of the use of drugs at all stages of their life cycle necessitated more effective software solutions in the framework of pharmacovigilance to protect patients and manufacturers and ensure compliance with legal requirements. Management of clinical trials involves evaluating the feasibility of conducting a clinical trial, integration with electronic systems for collecting patient data and research results, automatic generation of documents, and monitoring of each subject of research, which leads to an increase in the efficiency of their conduct.
Digital Systems for eCTD Creation on the Pharmaceutical …
387
Maintaining pharmacovigilance data is a mandatory, time-consuming, and costly process, while not generating direct revenue for the pharmaceutical company. Thus, most participants in the field of drug circulation feel the need to minimize the cost of pharmacovigilance, but at the same time comply with the requirements of the law to provide safety for patients. In common, the standard software product from the MS Excel office software suite (USA) is mainly used for this purpose. However, several specialized programs allow users to perform this range of work at a professional level. They provide the ability to classify, create, view, send, and maintain pharmacovigilance data and adverse event reports in one application. This software tracks the timing of the submission of information, supports multiuser work and compliance with the requirements of regulatory authorities of different countries. A single database allows for the long-term safety of information. Using a cloud service increases the level of reliability of data storage in comparison with local files on the hard drive of an individual employee. This software can integrate with other systems and components to obtain information describing drugs. Tools for planning regulatory actions and tracking the status of the submissions to regulatory bodies are solutions that analyze user-downloaded data that makes it possible to manage the portfolio of drugs. Of particular importance are these solutions for companies managing a large portfolio of projects which are represented in the markets of different regions. It is necessary not only to ensure compliance with standards and rules but also to coordinate complex actions with many stakeholders to ensure the correct evaluation of drugs in terms of quality, effectiveness, and safety. From this point of view, regulatory departments are responsible for the fate of the assets of the pharmaceutical companies. The key factors for the timely launch of the medicinal product on the market are tracking the timing of regulatory procedures, planning regulatory actions, and monitoring related projects. The profit of a pharmaceutical company directly depends on the quality of the implementation of these actions. Every day, the lack of market authorization can mean millions of losses to the company. In many companies, these processes are based on intensive manual efforts supported by MS Excel and other non-specialized IT solutions, such as regulatory product databases and enterprise management systems. The use of non-specialized solutions not only poses risks to data security and reliability in general but, more importantly, leads to a disconnection between operational and regulatory information. This greatly complicates the coordination of market authorization and production processes and increases the likelihood of errors in the documentation. The feasibility of introducing digital systems in the field of drug circulation has been repeatedly shown by other authors; for example, when introducing eCTD format in China [4, 5], European Union [6–10], USA [11], as well as in the ASEAN region, Australia, Croatia, India, Saudi Arabia, South Africa, Turkey [12], and Russian Federation [13]. Specialized digital solutions allow for the merging of regulatory and technological processes, information becomes available to all interested parties. The use of centralized solutions with active support from solution providers allows timely monitoring
388
K. Koshechkin et al.
of changes in regulatory legislation and taking the required measures. Planning for regulatory activities is noticeably simplified and is based on control points, compliance with the regulatory system, and requirements are ensured by an integrated warning system. The project management functionality is implemented in these systems, which facilitates the distribution of responsibilities of responsible persons, time control, and reporting. Systems provide access to current and outdated project data from a single interface to enable change tracking. Most often, this functionality is integrated with systems for drug identification, electronic dossiers creation, research managing, and pharmacovigilance.
5 Conclusions For pharmaceutical companies operating in the EAEU region, the issue of software readiness for working with the requirements of domestic regulators is of particular importance. Most foreign programs can be localized only after significant modifications. In this regard, the most technologically active companies in the Russian Federation prefer to develop their own software solutions, which are possibly inferior to Western ones in functionality, but will ensure full compliance with the EAEU requirements. Of course, the global software market for automating the formation of electronic dossiers and accompanying the market authorization of medicines is much wider than the presented list of programs, but the general trends can be identified. The vast majority of foreign-made software products do not provide a complete set of required functionalities and are not localized for the EAEU market. Their application requires significant costs for localization, and the prospects for updating in connection with the periodic update of regulatory requirements are questionable. Domestic software solutions are just beginning to appear and in some cases are highly specialized. For example, programs for planning and meeting regulatory deadlines in the market are represented by single products. Based on the information received, it can be concluded that there is no universal solution suitable for all pharmaceutical companies. The feasibility of choosing digital solutions requires the use of specialized techniques that consider a set of factors, such as the region where the company is located, the number of projects, the number of employees, corporate culture, and willingness to invest in capital. The result of the company’s work—the admission of the drug to medical use—depends on the correct choice of solution, and therefore, the prospects for the company as a whole. In this regard, the heads of pharmaceutical enterprises should pay considerable attention to this issue.
Digital Systems for eCTD Creation on the Pharmaceutical …
389
References 1. FDA approves the first drug with an indication for treatment of smallpox. FDA [Electronic resource]. https://www.fda.gov/news-events/press-announcements/fda-approves-firstdrug-indication-treatment-smallpox. Accessed 27 Dec 2019 2. Ppavila pegictpacii i kcpeptizy lekapctvennyx cpedctv dl medicinckogo ppimeneni [Electronic resource]. https://docs.eaeunion.org/ria/ru-ru/011597/ria_010 72015_att.pdf 3. 5 Reasons to choose the EXTEDO suite for Regulatory Information Management [Electronic resource]. https://3.imimg.com/data3/LN/TC/MY-16226817/extedosuite-for-global-ere gulatory-compliance-management.pdf 4. Fan, Y.: Opportunities, challenges and countermeasures for the implementation of eCTD. Chin. J. New Drugs. 28(16), 1997–2003 (2019) 5. Xia, L., Zeng, S.: A review of the common technical document (CTD) regulatory dossier for generic drugs in China. Chin. Pharm. J. 51(4), 329–334 (2016) 6. Ayling, C.: The common European submission platform (CESP)-a first-time perspective Regulatory Rapporteur. TOPRA 10(12), 31–32 (2013) 7. Menges, K.: Elektronische einreichung von zulassungsdossiers bei den behörden: Pro und contra für die beiden formate eCTD und non-eCTD e-submission (NeeS). Pharmazeutische Industrie 72(7), 1148–1158 (2010) 8. Nordfjeld, K.: Strasberger V: creating eCTD applications. J. Generic Med. 3(2), 140–146 (2006) 9. Kainz, A., Harmsen, S.: Einsatz eines dossier-management-systems in der arzneimittelindustrie/auswahl, einfhrung und praktische erfahrungen. Pharmazeutische Industrie 65(5 A), 511–519 (2003) 10. Franken, A.: Zukunft des regulatorischen e-managements bei der arzneimittelzulassung. Pharmazeutische Industrie 65(5 A), 491–497 (2003) 11. Bowers, A., Shea, M.E.: Transatlantic planning: using eCTD format US INDs as a planning and preparation tool for EU MAAs. Regul. Rapp. TOPRA 11(2), 28–31 (2014) 12. Casselberry, B.: Recent changes to dossier format requirements: in some emerging markets. Regul. Rapp. 9(4), 18–19 (2012) 13. Koshechkin, K., Lebedev, G., Tikhonova, J.: Regulatory information management systems, as a means for ensuring the pharmaceutical data continuity and risk management. In: Smart Innovation, Systems and Technologies, vol. 142, pp. 265–274. Springer Science and Business Media Deutschland GmbH (2019)
Scientific Approaches to the Digitalization of Drugs Assortment Monitoring Using Artificial Neural Networks Tsyndymeyev Arsalan, Konstantin Koshechkin , and Georgy Lebedev
Abstract The periodic lack of essential medicines on the market is a global public health problem. The possibility of using machine learning technology and artificial neural networks to monitor drug shortages has not been previously investigated. The aim of this work is to assess the appropriateness of using digital systems to collect primary data and machine learning systems for monitoring drug deficiency. The analysis of the problem area of the shortage of drugs and the capabilities of machine learning technologies and neural systems was performed. As a result of the obtained information, a scientific synthesis of promising possibilities for combining these areas was carried out. A neural network model has been proposed. The deep integration of existing information databases using a normative reference system, the application of scientific approaches to the monitoring of drug supply, including drug shortages, using machine learning and neural networks will improve the quality and continuity of the drug supply process for the population and, accordingly, medical care in general. In the context of piloting models for implementing insurance principles for reimbursing the population for the purchase of medicines, these decisions are most relevant.
T. Arsalan Ministry of Health of the Russian Federation, Moscow, Russia K. Koshechkin (B) · G. Lebedev I.M. Sechenov First Moscow State Medical University, 2-4 Bolshaya Pirogovskaya street, 119991 Moscow, Russia e-mail: [email protected] K. Koshechkin Federal State Budgetary Institution Scientific Centre for Expert Evaluation of Medicinal Products of the Ministry of Health of the Russian Federation, Moscow, Russia G. Lebedev Federal Research Institute for Health Organization and Informatics, 11, Dobrolubova street, 127254 Moscow, Russia © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_33
391
392
T. Arsalan et al.
1 Introduction One of the ways for analysis and forecasting automation is the use of artificial intelligence. It works with big data as a more effective replacement for human analytics. Computing systems are now able to solve more and more problems for which people were previously responsible. Artificial intelligence solutions provide better results and, in many cases, are more cost-effective. In the development of artificial intelligence, there is a vast area—machine learning. This direction is about studying methods for constructing algorithms that can learn on their own. This is necessary if there is no clear solution to a problem. In this case, it is easier not to look for the right solution, but to create a mechanism that will come up with a method for finding it. A neural network simulates the work of the human nervous system, and one of its features is the ability to self-study, considering previous experience. Thus, each time the system makes fewer errors. Machine learning technology may be relevant to the pharmaceutical industry. It allows companies to reduce the risk when making decisions while developing new drugs. In its turn, it will allow avoiding failures in the late stages of development. This technology is a part of analytics and computer modeling-based decisions. Also, machine learning may be used to analyze big data for risk assessment of increased resistance of infectious agents on the effects of antibacterial drugs. In the pharmaceutical and medical fields, stakeholders began to create databases that can be useful for training neural networks. It is assumed that the accumulation of relevant information about the health and treatment of diseases will give impetus to the active use of artificial intelligence in pharmaceuticals. The periodic lack of essential medicines on the market is a global public health problem. Lack of drugs can adversely affect the quality and effectiveness of patient care, as well as increase the cost of treatment and the burden of medical workers. Scientific evidence on methods to monitor and prevent drug shortages are limited, but this issue has already been studied in several publications [1–4]. For example, since 2016, a task force set up by the European Medicines Agency (EMA) and the Heads of Medicines Agencies (HMA) looks at availability issues, including medicines that are authorized but not marketed and supply chain disruptions, to improve continuity of supply of human and veterinary medicines across Europe. This builds on the network’s efforts since 2012 to improve processes for handling shortages caused by good manufacturing practice (GMP) noncompliance [5]. According to the Food and Drug Administration (FDA), drug shortages can occur for many reasons, including manufacturing and quality problems, delays, and discontinuations. Manufacturers provide FDA drug shortage information, and the agency works closely with them to prevent or reduce the impact of shortages. In October 2019, FDA issued a report, “Drug Shortages: Root Causes and Potential Solutions” [6] that attempts to identify root causes and offer recommendations to help prevent and mitigate drug shortages. The recommendations were based on insights gleaned from public and private stakeholders at a November 2018 Drug Shortages Task Force Public Meeting, from
Scientific Approaches to the Digitalization of Drugs …
393
comments, and listening sessions with stakeholders, as well as FDA data analysis and published research [7]. One of the most advanced methods for monitoring of the drug shortage is the introduction of digital data collection technology for primary data. This allows the use of further machine learning systems based on artificial neural networks for their analysis. The possibility of using machine learning technology and artificial neural networks to monitor drug shortages has not been previously investigated. Machine learning is a technology that allows artificial intelligence to automatically learn experience on their own without the use of programming [8, 9]. Its use is found in almost all areas of human life. For its work, it is necessary to create a huge database, united by clusters having a causal relationship with each other. Using the experience and knowledge of the scientific community about the methods of interaction and establishing dependencies between them allows us to get closer to the exact results of deep learning. Good results have been shown by artificial neural network models for the diagnosis of mental disorders [10], Parkinson’s disease [11], and Huntington disease [12]. Multilayer perceptron models are used to predict the risk of osteoporosis [13]. Inference and generalized regression are used to diagnose Hepatitis B [14]. A neural network is a series of algorithms whose purpose is to recognize the main connections in a data set through a process that mimics the functioning of the brain. The aim of this work is to assess the appropriateness of using digital systems to collect primary data and machine learning systems for monitoring drug deficiency.
2 Methods The analysis of the problem area of the shortage of drugs and the capabilities of machine learning technologies and neural systems was performed. As a result of the obtained information, a scientific synthesis of promising possibilities for combining these areas was carried out. A neural network model has been proposed. For the design of a neural network, the Neural Network Toolbox package from MATLAB 8.6 (R2015b) was used. The package represents a set of functions and data structures describing activation functions, learning algorithms, installation of synaptic scales, etc. A literature review was conducted using the electronic articles’ databases of PubMed and Scopus. As part of this article, publications about the problem area of the shortage of drugs and the capabilities of machine learning technologies and neural systems were used for the last 7 years (from 2012 to 2019). The writing of this article was motivated by the introduction of machine learning technology on the market and the need to automate the monitoring system for drug insufficiency. We analyzed existing systems containing data on drugs. The following keywords were used to search for articles: “Deep Learning”, “Forecast of Machine Learning”, “Forecast of Deep Learning”, “Shortage of Drugs”, and “Monitoring of a Shortage of Drugs”.
394
T. Arsalan et al.
3 Results Priority features for the drug shortage monitoring system based on machine learning technologies and neural systems were defined. (1) Predictability. Scientifically based determination of the needs of the population for providing of the Vital and Essential Drugs. (2) Prevention. The earliest possible identification of an increase in the probability of the insufficient supply of medicines risk occurrence in relation to demand. (3) Interoperability and versatility. The drug shortage monitoring should use the widest possible list of available information resources as input, unified rules for information exchange, and a unified system of classifiers, primarily pharmaceuticals. (4) Autonomy. The use of modern innovative digital technologies and mathematical methods, including those based on machine learning technologies and artificial intelligence, which will reduce the influence of the human factor on the efficiency of the system. (5) Globalization. Combining the maximum number of information systems and other sources containing information that can be interpreted to assess the supply and demand for drugs. As part of the work of drug shortage monitoring, a risk assessment should be carried out based on primary risk categories: (1) (2) (3) (4) (5)
physical (drug production has not been completed), logistic (procurement and delivery procedures have not been carried out), financial (no funding for the purchase), prognostic (production volume below the forecast of consumption), regulatory (GMP problems, there is no valid registration certificate or price regulation issues of the manufacturer, etc.), (6) others. For each source of risk, emerging factors that influence it should be identified. Also, various information systems can serve as sources of data on drug shortages. (1) (2) (3) (4) (5) (6) (7) (8)
The list of current and revoked permits for drug market circulation. List of registered prices for medicines (if regulated). The list of series and the number of packages in a series, taking into account the dates of entry into civil circulation. Data on the retail distribution of medicines according to cash registers of pharmacies. Data from the track and trace system on the packaging of medicines entry into circulation and drug usage by patients in hospitals. Data on signed and failed tenders, and terminated contracts within the framework of public procurement. Data on electronic prescriptions of doctors. Electronic recipes data.
Scientific Approaches to the Digitalization of Drugs …
(9) (10)
(11) (12) (13)
395
Data from the registers of pharmacies on preferential drug provision containing information about recipients of drugs. Information on the regional medical information systems about the fact of the consumption of drugs (for the past period and forecast of needs for the planning period). Information of national medical centers on the fact of the consumption of drugs for the past period and the forecast of needs for the planning period. Data from the system for collecting medical statistics on morbidity, the number of patients in the context of nosologies, and clinical and social groups. Data from healthcare standards and clinical guidelines.
Based on the listed priority areas, risk factors, and sources of information, a data matrix can be built. Data matrix information can be used to train an artificial neural network. Moreover, for each drug, periods of sufficient quantity in the market and periods of deficiency can be determined. In this work, we used the model of a multilayer perceptron (direct propagation neural network) trained based on the backpropagation algorithm [15]. As an activation function in the work, a logistic activation function was used: “F = 1/(1 + exp(-alphaY)” where “alpha”—logistic slope parameter. The multilayer perceptron has a high degree of connectivity realized through synaptic connections. Changing the level of network connectivity requires changing many synaptic connections or their weight coefficients. The combination of all these properties, along with the ability to learn from experience, provides the processing power of the multilayer perceptron. The artificial neural network contains an input layer, one hidden layer, and an output layer. The input layer of the neural network has 21 neurons (Table 1), the output layer has two neurons (is there a probable lack of a drug or not). Parameters are considered individually for each drug. Data is studied for a certain period. The error backpropagation algorithm involves calculating the error of both the output layer and each neuron of the network being trained, as well as correcting the weights of the neurons in accordance with their current values. At the first step of this algorithm, the weights of all interneuron connections are initialized with small random values (from 0 to 1). After the weights are initialized during the training of the neural network, the following steps are performed: direct signal propagation, calculation of the error of neurons of the last layer, and backpropagation of the error. Direct signal propagation is carried out in layers, starting from the input layer, in this case, the sum of the input signals for each neuron is calculated and, using the activation function, the response of the neuron is generated, which propagates to the next layer considering the weight of the interneuron coupling. As a result of this stage, we get a vector of output values of the neural network. The next stage of training is the calculation of the neural network error as the difference between the expected and actual output values. The obtained error values propagate from the last, output layer of the neural network, to the first. In this case, the correction values of the neuron weights are calculated depending on the current value of the connection weight, the learning speed, and the error introduced by this neuron. After completing
396
T. Arsalan et al.
Table 1 Parameters of the input layer of the neural network No. Parameter
Data type, Unit
1
There is a registration certificate in the country
Logical (yes/no)
2
Price regulation in place
Logical (yes/no)
3
Price
Number (currency)
4
The number of packages introduced into civilian circulation
Number (packages)
5
The number of packages removed from civilian circulation due to GMP issues
Number (packages)
6
Number of packages sold at retail
Number (packages)
7
The number of packages issued according to preferential recipes
Number (packages)
8
The number of packages used in the hospital segment
Number (packages)
9
The number of packages in stock
Number (packages)
10
Number of packages in signed contracts
Number (packages)
11
Number of packages in terminated contracts
Number (packages)
12
Number of packages in failed contracts
Number (packages)
13
The number of packages in doctor’s prescriptions
Number (packages)
14
The number of packages in electronic recipes
Number (packages)
15
The number of patients who require a drug according to the patient registries
Number (patients)
15
The number of patients who require a drug according to regional information systems
Number (patients)
16
Consumption forecast according to regional information systems
Number (packages)
17
The fact of consumption according to regional information systems
Number (packages)
18
Number of patients requiring a drug according to national medical centers
Number (patients)
19
Consumption forecast according to national medical centers
Number (packages)
20
The fact of consumption according to national medical centers
Number (packages)
21
Data collection system of medical statistics on morbidity, the number Number (patients) of patients by nosology, and taking into account treatment standards
this step, the steps of the described algorithm are repeated until the error of the output layer reaches the desired value.
4 Discussion Monitoring the lack of drugs is a complex integrated process that requires organizational and technical measures, primarily aimed at early detection of an increased risk of drug assortment insufficiency especially in the list of essential drugs. At the present stage of development in drug supply and informatization, its implementation requires the globalization of the digital space in the field of drug circulation.
Scientific Approaches to the Digitalization of Drugs …
397
To take preventive measures, it is advisable to use the modern predictive apparatus of multivariate mathematical analysis using innovative digital technologies, including machine learning and artificial intelligence. Due to the need to expand the coverage of monitoring the shortage of drugs, the integration of a large number of currently available data sources containing pharmaceutical information is necessary. Integration of information sources, application of risk assessments, and technology of neural networks will help to increase information security. Information will be created about the entire assortment of safe, effective, and high-quality medicines, especially the list of essential drugs and their accessibility for the population. Drug shortage monitoring will be focused on the priority healthcare needs for essential medicines for the prevention and treatment of diseases, including those prevailing in the structure of the region’s morbidity. The introduction of machine learning technology and artificial neural networks involves both organizational measures and the improvement of technical tools. In connection with a large number of information resources, a unified information space in the healthcare sector should be implemented. In a systematic plan, monitoring the shortage of drugs based on machine learning technology and artificial neural networks will help to improve the quality of medical care. As a result, a reduction in morbidity and mortality rates will be achieved. First of all, this is possible by increasing the availability of drugs for the population. Monitoring the lack of drugs based on machine learning technology and artificial neural networks should be implemented as a set of organizational and technical measures. They should be aimed at early detection of an increase in the probability of risk occurrence in the first place, the lack of Vital and Essential Drugs. Also important is the possibility of timely identification of the existing shortage of drugs and other tasks of increasing the information and analytical security of interested individuals. Technical implementation for drug shortage monitoring based on machine learning technology and artificial neural networks can be presented in the form of the State Information System, or commercial information system if access to the necessary data will be provided. It should be noted that, predominantly, the necessary information is managed by government agencies and agents. Drug shortage monitoring based on machine learning technology and artificial neural networks will require the creation of a centralized data processing algorithm. The basic data source is information about drug market authorization and price regulation. Also, data on entry into civil circulation should be considered. The next level of data is information from track and trace systems about the movement of labeled drugs from the manufacturer to the final consumer. At the stage of drug provision, information about signed, terminated, and failed contracts for the purchase of medicines for the needs of the population should be considered also. After the drugs are delivered to distribution networks, the data on electronic prescriptions are important. Delivery of the drugs to consumers can be obtained from the retail distribution of drugs by check through fiscal data operators.
398
T. Arsalan et al.
Assessment of the future needs of pharmaceutical products should be based on federal registries of patients, the register of citizens entitled to receive medicines, specialized medical nutrition products, medical products at the expense of budgetary appropriations of the federal budget and regional budgets. In parallel, similar information can be obtained at the regional level and used as a signal for a centralized information system. Regional medical information systems contain information on consumption and drug requirements. Similarly, information from national medical centers and medical statistics collection systems can be used. Drug shortage monitoring based on machine learning technology and artificial neural networks in the target state expands the possibilities in the preferential segment of drug provision by identifying and predicting defects in the hospital segment of drug provision and by taking into account retail sales of drug preparations. The variety of sources of risk of drug assortment insufficiency requires a high degree of flexibility and adaptability. Risks can be conditionally divided into physical (drug production has not been carried out), logistic (procurement and delivery procedures have not been carried out), financial (no financing for the purchase), prognostic (production volume below the consumption forecast), regulatory (no valid registration certificate or registered price or problems with GMP compliance), and more. Particular attention should be paid to drugs of the Vital and Essential Drug List for which actions related to government guarantees are being carried out. Based on the data obtained, measures should be taken at the level of state or regional regulation of the circulation of drugs. The organization of drug provision should be carried out with an appropriate analysis of the reasons for deregistration and the risks of deficiency in the corresponding assortment position of the list of the vital drugs. The national vaccination calendar for epidemiological indications and general information on the epidemiological situation in the region should be taken into account also. It is advisable to provide for the possibility of connecting international information resources, if available, containing information on the recall of drugs from the market and other factors that may become a source of the drug shortage. To form an evidence base for the need for drugs, it is necessary to develop uniform rules for the formation of digital data. Including the rules for the formation of state registers and regional registers of patients with socially significant diseases, the uniformed requirements for the formation of a single register of electronic recipes should be also applied. Evaluation of these factors periodically in the context of each drug, an assessment of the probability of its occurrence, and the impact of risk on assortment availability should be carried out. Drug shortage monitoring based on machine learning technology and artificial neural networks should be able to consider the measures taken to eliminate or reduce the likelihood of the onset and impact of risk, followed by a reassessment of risk to consider the effect of the measures taken. In a systematic plan, for the development of digitalization in the field of pharmaceutical supply, it is necessary to develop and implement a regulatory reference
Scientific Approaches to the Digitalization of Drugs …
399
system, primarily a multipurpose catalog of medicines, containing primary information about medicines that are allowed for circulation (Master data). This resource should provide the basic needs of all participants in the circulation of medicines, based on the standards of the ISO group “Identification of medicines”. Master data on drugs should form the basis of the work of information systems of data sources for drug shortage monitoring. The use of unified Master data in all integrable data sources will make it possible to implement the principle of interoperability and ensure the reuse of information resources.
5 Conclusions An artificial neural network is a convenient tool for assessing and predicting a lack of drugs. Artificial neural networks are powerful and at the same time flexible methods for simulating processes and phenomena. Neural network technologies are designed to solve difficult formalized tasks, through which, in particular, many problems of medicine are reduced. This is because the researcher is often provided with a large amount of heterogeneous factual material for which a mathematical model has not yet been created. Modern artificial neural networks are software solutions for creating specialized analytical models and allow solving a wide range of problems. A distinctive feature of neural networks is their ability to learn based on experimental data of the subject area. Concerning medical topics, experimental data are presented in the form of a set of initial values or parameters of the object and the result obtained based on them. Teaching a neural network is an interactive process in which the neural network finds hidden nonlinear relationships between the initial parameters and the final result, as well as the optimal combination of the weights of the neurons connecting adjacent layers, in which the determination error tends to a minimum. The advantages of neural networks include their relative simplicity, nonlinearity, readiness to work with fuzzy information, tolerance to the source data quality, and the ability to learn from specific examples. In the learning process, a sequence of initial parameters is supplied to the input of the neural network along with the results data that characterize these parameters. To train a neural network, it is necessary to have a sufficient number of examples for tuning an adaptive system with a given degree of reliability. If the examples relate to different sources, then the artificial neural network trained in this way allows us to subsequently determine and differentiate any new case represented by a set of indicators similar to those on which the neural network was trained. The undoubted advantage of the neural model is that when it is created, it is not necessary to represent the whole set of complex patterns of the description of the phenomenon under study. However, with the use of neural networks in practical problems, several difficulties are associated. One of the main problems with the use of neural network technologies is the initially unknown degree of required complexity for the designed neural network, which will be enough for a reliable forecast. This complexity may
400
T. Arsalan et al.
turn out to be unacceptably high, which will require the elaboration of the network architecture. The simplest single-layer neural networks are capable of solving only linearly shared tasks. This limitation may be overcome with the use of multilayer neural networks. The use of a multilayer neural networks for monitoring the deficiency of drugs will make it possible to transfer the prognostic apparatus from the level of empirical observations to the methodology of scientific forecasting based on modern digital technologies. Analysis and forecasting, as well as modeling of the modern drug supply system in the target state of the healthcare system. To test this technology, the accumulation of primary data for analysis and observation of the drug market is required to create a training data set. The deep integration of existing information databases using a normative reference system, the application of scientific approaches to the monitoring of drug supply, including drug shortages, using machine learning and neural networks will improve the quality and continuity of the drug supply process for the population and, accordingly, medical care in general. In the context of piloting models for implementing insurance principles for reimbursing the population for the purchase of medicines, these decisions are most relevant.
References 1. Bochenek, T., et al.: Systemic measures and legislative and organizational frameworks aimed at preventing or mitigating drug shortages in 28 European and Western Asian countries. Front. Pharmacol. 8 (2018) 2. Videau, M., Lebel, D., Bussières, J.F.: Drug shortages in Canada: data for 2016–2017 and perspectives on the problem. Ann. Pharm. Fr. 77(3), 205–211 (2019) 3. Medicines shortages [Electronic resource], WHO Drug Information, vol. 30, no. 2 (2016). https://www.who.int/medicines/publications/druginformation/WHO_DI_30-2_Medi cines.pdf?ua=1. Accessed 09 Jan 2020 4. Fda, C.: Report on Drug Shortages for Calendar Year 2016 Section 1002 of the Food and Drug Administration Safety and Innovation Act Food and Drug Administration Department of Health and Human Services Date 5 (2016) 5. Availability of medicines. European Medicines Agency [Electronic resource]. https://www. ema.europa.eu/en/human-regulatory/post-authorisation/availability-medicines. Accessed 09 Jan 2020 6. Report. Drug Shortages: Root Causes and Potential Solutions. FDA [Electronic resource]. https://www.fda.gov/drugs/drug-shortages/report-drug-shortages-root-causes-andpotential-solutions. Accessed 09 Jan 2020 7. Drug Shortages. FDA [Electronic resource]. https://www.fda.gov/drugs/drug-safety-and-ava ilability/drug-shortages. Accessed 09 Jan 2020 8. What is Machine Learning? A definition-Expert System [Electronic resource]. https://expert system.com/machine-learning-definition/. Accessed 09 Jan 2020 9. Obermeyer, Z., Emanuel, E.J.: Predicting the future-big data, machine learning, and clinical medicine. New Engl. J. Med. 375(13), 1216–1219 (2016) 10. Berebin, M.A.P.S.V.: The experience of using artificial neural networks for the differential diagnosis and prognosis of mental adaptation. Vestnik Yuzhno-Ural’skogo Gosudarstvennyy University, no 14, pp. 41–45 (2006) 11. Gil, D.J.M.: Diagnosing Parkinson by using artificial neural networks and support vector machines. Glob. J. Comput. Sci. Technol. 4(9), 63–71 (2009)
Scientific Approaches to the Digitalization of Drugs …
401
12. Singh, M., Singh, M.S.P.: Artificial Neural Network based classification of Neuro-Degenerative diseases using Gait features. Int. J. Inf. Technol. Knowl. Manag. 7(1), 27–30 (2013) 13. Basit, A., Sarim, M., Raffat, K., et al.: Artificial neural network: a tool for diagnosing osteoporosis. Res. J. Recent Sci. 3(2), 87–91 (2014) 14. Mahesh, C., Suresh, V.G.B.M.: Diagnosing hepatitis B using artificial neural network based expert system. Int. J. Eng. Innov. Technol. 3(6), 139–144 (2013) 15. Mustafaev, A.G.: The use of artificial neural networks for early diagnosis of diabetes mellitus. Cybern. Program. (2), 1–7 (2016)
The Geographic Information System of the Russian Ministry of Health Georgy Lebedev, Alexander Polikarpov, Nikita Golubev, Elena Tyurina, Alexsey Serikov, Dmitriy Selivanov, and Yuriy Orlov
Abstract We consider the aspects of the geographic information system application in the healthcare area. We describe the main functions and features of its usage at the medical organizations level, at the level of the executive authority of a constituent entity of the Russian Federation in the field of health care and at the state federal level. The breakthrough in medical-geographical mapping and modeling took place due to the development of modern information technologies. World science expands the field of application of geoinformation approaches in medicine and health care. The geographic information systems have been used in healthcare practice allowing visualization of geographical objects, processes and phenomena, as well as analysis, planning and modeling. A geographic information system is designed to consolidate and graphically display information on healthcare resources, including medical organizations and their structural units involved in the implementation of territorial state guarantee programs for free medical care to citizens. Access to the system is carried out using the Unified Identification and Authentication System. We present the options for working with a geographic information system and the application examples. Legal aspects of use and regulations are discussed, as well as healthcare application fields.
G. Lebedev · A. Polikarpov · N. Golubev · A. Serikov · D. Selivanov (B) · Y. Orlov I.M. Sechenov First Moscow State Medical University, 2-4 Bolshaya Pirogovskaya street, 119991 Moscow, Russia e-mail: [email protected] N. Golubev e-mail: [email protected] G. Lebedev · N. Golubev · E. Tyurina Federal Research Institute for Health Organization and Informatics, 11, Dobrolubova street, 127254 Moscow, Russia G. Lebedev V.I. Kulakov National Medical Research Center for Obstetrics, Gynecology and Perinatology, 4 Oparin street, 117997 Moscow, Russia © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_34
403
404
G. Lebedev et al.
1 Introduction Geoinformation technologies became widespread in the second half of the last century. The mid-1990s was the beginning of a new era of geoinformatics—the era of spatial data infrastructures (SPD) [1]. The main functions of geoportals are the functions of searching, visualizing, information download, converting data and calling other (remote) services [2]. Most actively, such technologies were used to reflect the demographic situation [3, 4]. Visualization of statistical data on population density is presented, for example, in the Regional Statistical Atlas of Germany, available on the website of the German Statistical Service (http://www.destatis.de/onlineatlas) [1]. Much attention is paid to the geographical aspects of medical care delivery [5], transport care for patients [6], including the elderly [7, 8]. The development of medical-geographical atlases in Russia has a strong academic background. The founder of this trend in the USSR is B. B. Prokhorov. In the 60–70s of the last century, he participated in the development of medical and geographical sections in complex regional atlases (Atlas of the Altai Territory, Atlas of Transbaikalia, Atlas of the Sakhalin Region) [9–11]. At the beginning of the XXI century, several regional medical-geographical and medical-demographic atlases were published [12–14]. Thanks to the advent of modern information technologies, there has been a breakthrough in medical-geographical mapping and modeling [15–18], both in technology and in application areas ranging from models of diseases related to skin cancer [17] to optimizing drug delivery [18, 19]. The terms such as “drug desert” (pharmacy deserts) have appeared [19, 20]. Geoinformation systems have begun to be used in practice, which makes it possible to visualize geographical objects, processes and phenomena, and also allow analysis, planning and modeling [14]. Similar applications of geographic information systems work in the world both in large cities [21] and in rural areas [22], and in different countries like the USA [23, 24], Brazil [25] and Australia [26].
2 Materials and Methods The paper describes the functioning of the Geographic Information System, which is a subsystem of a unified state information system in the field of health care in the Russian Federation. When performing the work, descriptive, analytical and geolocation methods were used.
The Geographic Information System …
405
3 Results The Ministry of Health of the Russian Federation developed in 2015 the Geoinformation System (hereinafter referred to as the System), designed to consolidate and graphically display information on health resources, including medical organizations; their structural units involved in the implementation of territorial state programs guarantees free provision of medical care to citizens and the settlements in whose territory they are located. It continues to use the system. The system allows you to visualize the data of reporting forms on the provision of settlements of the Russian Federation with primary and emergency medical care, as well as the availability of medical care to the population. Access to the System is carried out using the Unified Identification and Authentication System (ECIA), an information system in the Russian Federation that provides authorized access to information interaction participants (applicants, citizens and officials of executive authorities) to the information contained in state information systems and other information systems. After authorization, access to software functions is performed depending on the user’s role. At the level of the medical organization, information is filled on the location and infrastructure of the institution, information on the settlements located in the service area, indicating the population. In addition, the System contains data on medical care profiles, bed fund of medical organizations and key structural units. At the level of the executive authority of the constituent entity of the Russian Federation, the correctness of the entered information and the analysis of the current situation are controlled. One of the key tasks of the System is the reflection of information about a real network of medical organizations. The body of executive power of the constituent entity of the Russian Federation in the field of health has the most complete information about subordinate medical organizations. The system allows each region, heads of regional health care, to have an accurate idea of the medical organizations operating in their territories, up to FAPs and outpatient clinics, regardless of the form of ownership and departmental subordination, as well as to make informed and informed management decisions. At the level of the Ministry of Health of Russia, an analysis of the summary information presented in the System by the executive authorities of the Subjects of the Russian Federation is carried out. The information presented in the System allows you to build a route to a medical organization based on the existing road coverage map. When indicating the beginning of the route and the medical organization, the distance between them and the time required for arrival are estimated depending on the method of movement. Using the System, it is possible to assess the walking and transport accessibility of medical organizations by regions, districts and settlements—the compliance of the medical infrastructure with the criteria for access to medical care.
406
G. Lebedev et al.
Fig. 1 Settlements, medical organizations and their structural units on the map
In addition, it is possible to assess the availability of medical care in specialized departments and centers—primary vascular departments and regional vascular centers, as well as in the context of medical care profiles. The basis of the system is a cartographic display of information. Information on settlements is submitted in accordance with the Register of administrative-territorial units of the constituent entities of the Russian Federation. On the map are visualized all medical organizations, their branches, structural and separate divisions, the data of which were entered into the System (Fig. 1). Figure 1 shows the location of medical organizations on a map of the region. Information about medical organizations is displayed in the form of labels of various colors. An important option of the System is the ability to scale. The number displayed inside the label in dark gray indicates the number of medical organizations in their branches, structural and separate units located in this place of the map. When approached, the number of tags increases, and the number of organizations, their branches, structural and separate divisions, in them, on the contrary, decreases. In addition to the option of viewing the location of medical organizations on a map, the System allows you to estimate the distance from the village to the nearest medical organization that provides this or that type of medical care with a simultaneous assessment of transport and walking distance (on public roads). At the same time, in addition to geographical coordinates, the system visualizes the infrastructure of the medical organization, including such parameters as the presence of an attached population, planned capacity, bed capacity of the medical organization in the context of medical care profiles, the presence in the structure of the institution of the primary vascular department, regional vascular center, trauma center, maternity hospital and clinical diagnostic laboratory. In addition, information is provided on the subordination of this medical organization.
The Geographic Information System …
407
Fig. 2 Settlements, medical organizations and their structural units on the map
When you select one of the medical organizations in the right part of the window, a route is laid to it (Fig. 2). It should be noted that the marks of organizations located within a walking distance are colored green, those that are in transport accessibility— yellow. Figure 2 shows an example of building a route from a given point to a medical organization on an interactive map. For the convenience of users from the system, a number of reporting forms can be unloaded, which allow viewing summary information on a subject of the Russian Federation, and unloading is available for all subjects at the level of the Ministry of Health of the Russian Federation. It is possible to export reporting tables to Microsoft Excel format, which allows you to analyze the information presented in the System in various sections. According to the recommendations of the Ministry of Health of the Russian Federation, a quarterly periodicity of updating information in the System is established. In order to increase the completeness and reliability of the data in the System, the Governors and heads of executive authorities of subjects in the field of health care sent recommendations to the Russian Ministry of Health to take personal control of the implementation of measures to form a network of medical organizations of primary health care and to update the information presented in the System. To verify the information provided in the System, as part of the collection and processing of federal and industry-specific statistical surveillance forms for 2017, we reconciled the data of federal statistical surveillance forms No. 30 “Information
408
G. Lebedev et al.
about the medical organization” and No. 47 “Information about the network and activities of medical organizations” and information provided in the System. The identified inaccuracies were brought into compliance, and the results of the verification were provided to the heads of the statistics service and representatives of the constituent entities of the Russian Federation. A thorough verification of the information was carried out, which made it possible to increase the reliability of the presented statistical data. To increase the objectivity of the information presented in the System, the data are reconciled in real time, inaccuracies are corrected when filling in the fields and the geographical coordinates of settlements and medical organizations are validated. Constant methodological support is provided for specialists working with the System.
4 Discussion We note the capabilities of geographic information systems both for organizing medical care and for studying natural focal diseases [27, 28], analysis of environmental factors and public health in new underdeveloped territories, in the Arctic [29]. The legal status of the Geographic Information System as a subsystem of the Unified System, as well as its functions, principles of data generation and issues of interaction with other subsystems of the Unified System, are enshrined in the Decree of the Government of the Russian Federation dated May 5, 2018, No. 555 “On the Unified State Information System in the Field of Health” (hereinafter referred to as the Regulation on the Unified System). So, according to the Regulation on the Unified System, the Geographic Information System provides an automatic collection of information from the Federal Register of Medical Organizations and the Federal Register of Medical Workers (clause 26 of the Regulation on the Unified System). At the same time, in order to ensure the correct and timely entry of information, the Regulation on the Unified System establishes the obligation of medical organizations of the state, municipal and private health systems, authorized state authorities of the constituent entities of the Russian Federation to post information in the Federal Register of Medical Organizations and the Federal Register of Medical Workers on time. According to paragraphs 1–16 of Appendix No. 1 to the Regulation on the Unified System, information is submitted to the Federal Register of Medical Workers within 3 business days from the date of receipt of the updated data, to the Federal Register of Medical Organizations—within 5 business days from the date of receipt of the updated data from medical organizations. Prior to the entry into force of the Regulation on the Unified System, the Geoinformation system was filled out on the basis of methodological recommendations for maintaining the Geoinformation System information resource and was updated
The Geographic Information System …
409
at the request of the Russian Ministry of Health to state authorities of the constituent entities of the Russian Federation. The adoption of the above regulatory legal acts made it possible to ensure the timely entry of information and to increase the relevance of the data contained in the Geographic Information System. One of the important functional features of the System is the possibility of cartographic visualization of information not only about the location of the healthcare facility as a whole, but also detailed data on medical care profiles and key departments. This allows us to analyze the availability of medical care, taking into account the territorial distribution of units of medical organizations, the types and profiles of the medical care they provide. According to paragraph 27 of Appendix No. 1 to the Regulation on the Unified System, the Geoinformation System provides for automatic data collection from various subsystems of the Unified System in order to display information on health resources on the geoinformation map, as well as the dynamics of commissioning stationary health facilities. Integration of the System with other information systems, including the Federal Register of Medical Organizations and the Federal Register of Medical Workers to visualize the information contained in them, will allow us to analyze the equipment of medical organizations and optimize the use of available resources. In December 2018, the presidium of the Presidential Council for Strategic Development and National Projects approved the passports of national projects, including the passport of the National Healthcare project. One of the most significant planned results of the implementation of the National Healthcare project is the construction of new paramedic, paramedic-obstetric centers, and medical outpatient clinics. These systems were used in the development of measures of the National Healthcare project in terms of ensuring optimal accessibility for the population (including for residents of settlements located in remote areas) of medical organizations providing primary health care. Using the System, the number of settlements located outside the accessibility zone of a medical organization or its structural unit that provides primary health care is estimated. Based on the results of the accessibility analysis, measures were developed for the Federal project “Development of the primary healthcare system” of the National Healthcare project and the progress of its implementation is monitored.
5 Conclusion We may recap the following: – The geographic information system allows you to visualize information on available healthcare resources, both at the regional and federal levels;
410
G. Lebedev et al.
– For the implementation of territorial planning and adjusting the principles of providing medical care for various profiles, executive authorities of subjects in the field of health can use visualization tools; – The mapping of health facilities by individual profiles of medical care allows the main freelance specialists of the constituent entities of the Russian Federation and the Russian Ministry of Health to analyze routing when providing medical care for the relevant services; – The Ministry of Health of the Russian Federation has a tool for monitoring the availability of various types and profiles of medical care in real time; – The implementation of spatiotemporal analysis for the planning, forecasting and modeling of the healthcare system by profiles and types of medical care is impossible without analysis of the information presented in the Geographic Information System. In conclusion, we note the continuity of medical-geographical mapping technologies [30] and the potential for the use of such systems in the Arctic [29], in the face of climate change and anthropogenic environmental impacts.
References 1. Koshkarev, A.V.: Geo-portals and maps in the SDI’s ERA. In: Proceedings of the International Conference Intercarto/InterGIS 15 Sustainable Development of Territories, Perm (Russia)Ghent (Belgium), pp. 242–246 (2009). (In Russian) 2. Koshkarev, A.V., Tikunov, V.S., Timonin, S.A.: Cartographic Web services of geoportals: technological solutions and implementation experience. Spat. Data 3, 6–12 (2009). (In Russian) 3. Mullner, R.M., Chung, K., Croke, K.G., Mensah, E.K.: Geographic information systems in public health and medicine. J. Med. Syst. 28(3), 215–21 (2004) 4. Jamedinova, U.S., Shaltynov, A.T., Konabekov, B.E., Abiltayev, A.M., Mysae, V.A.O.: The use of geographic information systems in health: a literature review. Sci. Health 6, 39–47 (2018) 5. Hardt, N.S., Muhamed, S., Das, R., Estrella, R., Roth, J.: Neighborhood-level hot spot maps to inform delivery of primary care and allocation of social resources. Perm. J. 17(1), 4–96 (2013). https://doi.org/10.7812/tpp/12-090 6. Shaw, J.J., Psoinos, C.M., Santry, H.P.: It’s all about location, location, location: a new perspective on trauma transport. Ann. Surg. 263(2), 413–8 (2016) 7. O’Mahony, E., Ni Shi, E., Bailey, J., Mannan, H., McAuliffe, E., Ryan, J., Cronin, J., Cooney, M.T.: Using geographic information systems to map older people’s emergency department attendance for future health planning. Emerg. Med. J. 36(12), 748–753 (2019) 8. Franchi, C., Cartabia, M., Santalucia, P., Baviera, M., Mannucci, P.M., Fortino, I., Bortolotti, A., Merlino, L., Monzani, V., Clavenna, A., Roncaglioni, M.C., Nobili, A.: Emergency department visits in older people: pattern of use, contributing factors, geographical differences and outcomes. Aging Clin. Exp. Res. 29(2), 319–326 (2017) 9. Protsyuk, I.S., et al. (eds.): Atlas of the Altai Territory, vol. I, 222 p., GUGK, Barnaul (1978). (In Russian) 10. Sochava, V.B. (ed.): Atlas of Transbaikalia (Buryat Autonomous Soviet Socialist Republic and the Chita Region), 176 p. GUGK, Irkutsk (1967). (In Russian) 11. Komsomolsky, G.V., Sirak, I.M. (eds.): Atlas of the Sakhalin Region, 135 p. GUGK, Moscow (1967). (In Russian) 12. Malkhazova, C.M. (ed.): Medical and demographic atlas of the Kaliningrad region, 85 p. LUKOIL-Kaliningradmorneft, Kaliningrad (2007). (In Russian)
The Geographic Information System …
411
13. Malkhazova, C.M., Gurov, A.N. (eds.): Medical and demographic atlas of the Moscow region, 110 p. Geography Department of Moscow State University, Moscow (2007). (In Russian) 14. Somov, E.V., Timonin, S.A.: The use of geoinformation methods in solving the problems of optimizing medical services for the population of Moscow. Doctor Inform. Technol. 2, 30–41 (2012). (In Russian) 15. Barinova, G.M.: A New Word In Atlas Mapping S. Malkhazova, A. Prasolova I. Medicalgeographical atlas of the Kaliningrad region. M. Kaliningrad, 2007. 85 pp. Bulletin of the Baltic Federal University named after I. Kant. Series: Natural and Medical Sciences, vol. 1, pp. 129–130 (2009) 16. Varghese, J., Fujarski, M., Dugas, M.: StudyPortal-geovisualization of study research networks. J. Med. Syst. 44(1), 22 (2019) 17. Marocho, A.Y., Vavrinchuk, A.S., Kosyh, N.E., Pryanishnikov, E.V.: Climate and malign skin tumors (research with geographic information system in Khabarovsk Krai). Russ. Open Med. J. 3, 0106 (2014) 18. Taraskina, A.S., Kulikov, A.S., Soloninina, A.V., Faizrakhmanov, R.A.: Geographic information systems as a tool of optimization of medicinal maintenance of the population and medical institutions analgesic drugs. Eur. J. Nat. History 6, 66–68 (2016) 19. Pednekar, P., Peterson, A.: Mapping pharmacy deserts and determining accessibility to community pharmacy services for elderly enrolled in a State Pharmaceutical Assistance Program. PLoS One 13(6), e0198173 (2018) 20. Amstislavski, P., Matthews, A., Sheffield, S., Maroko, A.R., Weedon, J.: Medication deserts: survey of neighborhood disparities in availability of prescription medications. Int J Health Geogr. 11, 48 (2012) 21. Iroh Tam, P.Y., Krzyzanowski, B., Oakes, J.M., Kne, L., Manson, S.: Spatial variation of pneumonia hospitalization risk in Twin Cities metro area. Minnesota. Epidemiol Infect. 145(15), 3274–3283 (2017) 22. Chisholm-Burns, M.A., Spivey, C.A., Gatwood, J., Wiss, A., Hohmeier, K., Erickson, S.R.: Evaluation of racial and socioeconomic disparities in medication pricing and pharmacy access and services. Am. J. Health Syst. Pharm. 74(10), 653–668 (2017) 23. Min, E., Gruen, D., Banerjee, D., Echeverria, T., Freelander, L., Schmeltz, M., Sagani, E., Piazza, M., Galaviz, V.E., Yost, M., Seto, E.Y.W.: The Washington state environmental health disparities map: development of a community-responsive cumulative impacts assessment tool. Int. J. Environ. Res. Public Health 16(22), pii: E4470 (2019) 24. Bazemore, A., Phillips, R.L., Miyoshi, T.: Harnessing geographic information systems (GIS) to enable community-oriented primary care. J. Am. Board Fam. Med. 23(1), 22–31 (2010) 25. de Moura, E.N., Procopiuck, M.: GIS-based spatial analysis: basic sanitation services in Parana State, Southern Brazil. Environ. Monit. Assess. 192(2), 96 (2020) 26. Burkett, E., Martin-Khan, M.G., Scott, J., Samanta, M., Gray, L.C.: Trends and predicted trends in presentations of older people to Australian emergency departments: effects of demand growth, population aging and climate change. Aust. Health Rev. 41(3), 246–253 (2017) 27. Malkhazova, S.M., Kotova, T.V., Mironova, V.A., Shartova, N.V., Ryabova N.V.: Medicalgeographical atlas of Russia “Natural focal diseases: concept and first results”. Series 5. Geography 4, 16–23 (2011). (In Russian) 28. Kurepina, N.Y., Vinokurov, Y.I., Obert, A S., Rybkina, I.D., Tsilikina, S.V., Cherkashina E.N.: A comprehensive cartographic analysis of tick-borne zoonoses in the medical and geographical Atlas of Altai Territory. In: Proceedings of the Altai Branch of the Russian Geographical Society, no. 2 (53), pp. 14–26 (2019) 29. Gorbanev, S.A., Fridman, K.B., Fedorov, V.N.: Geoinformation portal “Sanitary and epidemiological welfare of the population in the Arctic zone of the Russian Federation” as a promising tool for a comprehensive assessment of the state of the environment and health factors of the population of the Russian Arctic. Russian Arctic 6, pp. 8–13 (2019). (In Russian) 30. Chistobaev, A.I., Semenova, Z.A.: Medical-geographical mapping in the former USSR and modern Russia. Earth Sci. 4, 109–118 (2013). (In Russian)
Creation of a Medical Decision Support System Using Evidence-Based Medicine Georgy Lebedev, Eduard Fartushniy, Igor Shaderkin, Herman Klimenko, Pavel Kozhin, Konstantin Koshechkin, Ilya Ryabkov, Vadim Tarasov, Evgeniy Morozov, Irina Fomina, and Gennadiy Sukhikh
Abstract This article presents a new study related to the creation of a medical decision support system with an intellectual analysis of scientific data (texts of medical care standards, clinical guidelines, instructions for the use of medicines, scientific publications of evidence-based medicine). Such a system is designed to provide the possibility of making medical decisions in pharmacotherapy, taking into account personalized medical data due to the optimal prescription of medicines and the use of medical technologies, reducing the frequency of undesirable reactions while using two or more drugs for different indications. The technical goal of the study is to create an intelligent automated information system to support the adoption of medical decisions and its implementation in clinical practice. This work was supported by a grant from the Ministry of Education and Science of the Russian Federation, a unique project identifier RFMEFI60819X0278.
G. Lebedev (B) · E. Fartushniy · I. Shaderkin · H. Klimenko · P. Kozhin · K. Koshechkin · I. Ryabkov · V. Tarasov · E. Morozov I.M. Sechenov First Moscow State Medical University, 2-4 Bolshaya Pirogovskaya St, 119991 Moscow, Russia e-mail: [email protected] G. Lebedev · I. Shaderkin · I. Fomina Federal Research Institute for Health Organization and Informatics, 11, Dobrolubova Street, 127254 Moscow, Russia e-mail: [email protected] G. Lebedev · G. Sukhikh V.I.Kulakov National Medical Research, Center for Obstetrics, Gynecology and Perinatology, 4 Oparin St., 117997 Moscow, Russia e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_35
413
414
G. Lebedev et al.
1 Introduction The most important problem of the health system in the framework of the implementation of the health development strategy until 2025 is to ensure the availability, effectiveness and safety of medical care provided to the population of the Russian Federation. An important problem is the fact that organizations and institutions involved in the circulation of medicines and the use of medical technologies often don’t use a methodology for evaluating the effectiveness of their use. In the strategy of scientific and technological development of Russia until 2035, approved in December 2016, the President of the Russian Federation V. V. Putin said: “in the next 10–15 years, the priorities of scientific and technological development of the Russian Federation should be considered those areas that will allow to obtain scientific and scientifictechnical results and create technologies that are the basis of innovative development of the domestic market of products and services, Russia’s stable position in the foreign market, and will provide: (a) the transition to advanced digital, intelligent production technologies, robotic systems, new materials and methods of design, the creation of systems for processing large amounts of data, machine learning and artificial intelligence; (b) transition to personalized medicine, high-tech health care and health-saving technologies, including through the rational use of medicines. Priority projects, according to the State program of the Russian Federation “development of health care”, include the development and implementation of innovative methods of diagnosis, prevention and treatment, as well as the basics of personalized medicine, and improvement of the organization of medical care based on the introduction of information technologies. As part of this program, the Federal project “Creating a digital health circuit based on the state information system in the health sector of the Russian Federation” has been launched. Creating a digital circuit involves, among other things, the introduction of intelligent systems to support medical decision-making. The rapid development of information technologies and the digitalization of large volumes of databases require fundamentally new approaches and continuous improvement of methods for their analysis. Constantly updated volumes of biomedical information and growing exponentially the number of new publications require the development of effective and high-quality methods of thematic categorization of documents and categorization, extraction of facts and knowledge. The priority fundamental direction of the methods of intellectual analysis of scientific publications is their use in systems of informed medical decisions, including preventive and personalized medicine. Knowledge of the results of the latest research on the clinical use of drugs and evaluation of the diagnostic accuracy of high-tech medical methods can significantly improve the efficiency of medical care and save significant material resources. Intelligent methods of searching and extracting knowledge from large databases will allow medical specialists to conduct highly effective treatment and monitor its results from the perspective of personalized medicine. As of 2019, there are a large number of medical databases developed on the principles of evidence-based medicine—more than 30 million links indexed by
Creation of a Medical Decision Support System …
415
PubMed—the largest database of biomedical literature developed and maintained by the National biotechnological information center (NCBI). The Entrez NCBI search engine, integrated with PubMed, provides access to a diverse set of 38 databases [1]. PubMed currently indexes publications from 5,254 journals in biology and medicine, dating back to 1948. At the present stage, PubMed serves as the main tool for searching biomedical literature. Every day, the system processes several million requests generated by users, in order to keep abreast of the latest developments and prioritize research in their fields. Despite providing PubMed with effective search interfaces, it is becoming increasingly difficult for its users to find information that meets their individual needs. Detailed user’s queries generate search results containing thousands of relevant documents. A popular database of reliable medical information is the Cochrane Library—a knowledge base formed by an independent network of researchers, consisting of more than 37 thousand scientists in 130 countries. Research results in the form of systematic reviews and meta-analyses of randomized trials are published in the Cochrane Library database [2]. A large number of medical Russian-language articles, results of clinical research, is in the Central Scientific Medical Library (CSML) of the I.M. Sechenov First Moscow State Medical University (Sechenov University), which is the Russianlanguage equivalent of PubMed. CSML is the main medical library of Russia. According to Clarivate Analytics, another citation database of scientific publications Web of Science Core Collections contains more than 1.4 billion links from more than 20 thousand sources of publications (https://clarivate.com/products/webof-science/web-science-form/web-science-core-collection/; February 2019). Microsoft Academic Search database (https://academic.microsoft.com; February 2019) under the section “Medicine” has more than 26 million publications and more than 17 million under the section “Biology”. At the same time, the most rapid growth in volumes is observed in the section “Biochemistry”—more than 5 million publications, of which more than a million belongs to the category “Genetics”. Google Scholar does not report the volume of links that can be identified using their search engine, however, on the request “personalized medicine “ (https://scholar.google.ru/scholar?hl=ru&assdt=0%2C5&q=personali zed+medicine&btnG; February 2019) the volume of search results amounted to 1 million 210 thousand documents. The approximate number of daily publications only in these indexed databases on medical topics has already exceeded 15 thousand publications per day. These resources serve as an information base for creating modern systems for supporting clinical decisions based on evidence-based summaries of information—DymaMed (USA), UptoDate (USA)—EBMG—Evidence-Based Medicine Guidelines (Finland) and the like. It is obvious that the processing of large amounts of information without the use of special methods of analysis of biomedical content is simply impossible. In the last decade, text mining methods (in English-language transcription referred to as
416
G. Lebedev et al.
TextMining) were widely used to solve problems of extracting information from arrays and collections of text documents [4, 5]. Currently, there is no basic collection of descriptions of the principles of information analysis, which can be considered a methodological guide to the intellectual analysis of a biomedical text. A number of works describe fundamental methods of processing the natural language of information supply [1, 6]. The Roadmap of the national Technical Initiative “Health Net” indicates the need to create decision support systems (DCS) in the field of preventive medicine using technologies of evolutionary modeling, a digital model of knowledge about human health and the properties of correction tools, processing large amounts of data and individual monitoring of functional status, as well as telemedicine consultations to the population. Decision support systems (DCS) are a class of specialized medical information systems that are registered in accordance with the established procedure for medical use and are part of medical information systems for local use in medical organizations. Modern methods of intellectual analysis of biomedical texts are aimed at solving individual problems from the list below: • • • • •
Information retrieval, Named entity recognition and named entity identification, association extraction [7], event extraction [8], pathway extraction.
Currently, information search is limited to presenting collections of documents from unstructured data sets that meet the information request. It doesn’t solve the problem of analyzing information and identifying hidden semantic patterns, which is the main purpose of text mining. Information search is related to the known problems of finding relevant documents in response to a specific need for information. The traditional approach of a contextual search for pre-indexed documents doesn’t provide search results of the desired quality and requires the user to spend additional time studying the provided documents. The study of approaches to dynamic thematic modeling is an active area of research [9–14]. However, the experience of the practical application of the method to identify the subject of medical documents is unknown to us. Apparently, this is due to the specifics of the linguistic support for the analysis of biomedical content. Despite the presence of supranational thesauri and ontologies (ICD-10, MeSH, SNOMEDCT, LIONC), and the allocation of named entities (diseases, genes, proteins, drugs, clinical analyses), tokenization of biomedical literature remains a serious problem, due to incoordination of names of known entities, such as symptoms and drugs, nonstandard abbreviations, and the specifics of different languages. Identifying named objects allows you to associate objects of interest with information that is not detailed in the publication. Normalization of named entities in various languages of information representation is the most important condition for building effective predictive thematic models in the direction of personalized medicine.
Creation of a Medical Decision Support System …
417
In addition, we believe that in order to identify priority areas for the development of preventive and personalized medicine, the repository of processed documents should not be limited to scientific publications only. In practical bibliometrics, the term “gray” literature is used to describe documents not published by scientific publishers, and it can form a vital component of reviews of evidence, such as systematic reviews and systematic maps, rapid assessments of facts [15], and brief reviews [16]. “Gray” literature in the most general case includes • • • • • • • • • •
reports on research and development activities; PhD and doctoral dissertations; conference proceedings; patents; final qualifying works of students of foreign prestigious educational institutions specializing in this subject; reviews and reviews, examination materials; documents, materials and publications of specialized websites; reports and presentations, working materials of seminars, round tables, preprints; documents of tenders and tenders, contracts, contracts and agreements; analytical forecasts, educational materials, posters, slides, illustrations, reviews and other materials.
In our opinion, “gray” literature can be very useful in the search or synthesis of new directions and topics of research, despite the fact that it is not published officially in the same way as traditional academic literature, for example. In General, considerable effort is required to search for “gray” literature in an attempt to include data stored by practitioners, as well as to account for possible publication bias. Publication bias is a tendency for significant positive studies to be more likely to be published than nonsignificant or negative studies, which leads to an increased likelihood of overstating the effect in meta-analysis and other syntheses [17]. The inclusion of “gray” literature in the collection of analyzed documents is an additional condition for the study and identification of trends and trends in the development of personalized medicine. By including all available documented data in the study sample, more accurate forecasting is achieved and the risk of bias is reduced. In the study we initiated, we propose to develop a method for identifying promising areas of development of personalized medicine by building a multilingual dynamic probabilistic topic model (Dynamic probabilistic topic model) on the collection of documents under study. The model will be multilingual in nature, take into account the n-language dictionary (by the number of types of languages represented in the selected collection of documents) and ontological relationships between documents of comparable collections.
418
G. Lebedev et al.
2 Materials and Methods At the moment, there is no effective information technology for the analysis of large amounts of data in the tasks of finding evidence, preventive and personalized medicine. The development of intelligent methods for searching and extracting knowledge from unstructured text information repositories will allow using them in informed medical decision-making systems, in choosing highly effective treatment and monitoring its results from the perspective of evidence-based and personalized medicine. The search, retrieval and systematization of facts and knowledge of evidencebased medicine are supposed to be carried out by constructing a multilingual dynamic probabilistic topic model on the collection of studied documents. The model will have a multilingual character, take into account the n-language dictionary (according to the number of types of languages represented in the selected collection of documents) and ontological relationships between documents of a comparable collection. The normalization of the terms in the collection of multilingual documents is supposed to be carried out using the thesauri of MeSH medical terms for each language of presentation of documents. Due to the fact that the current version of MeSH provides synchronization of the used terms in more than 48 languages, to cover the entire multilingual collection of documents, this choice can be considered quite reasonable. Additional classification features for thematic rubrication of evidence-based medicine documents are supposed to be obtained from open sources of biomedical ontologies—BioPortal (http://bioportal.bioontology.org/), genomic databases for the connection with diseases (http://www.disgenet.org) and databases containing protein information (www.pdbe.org). To combine different types of classification features, it is supposed to use additive regularization of the created thematic model. The mathematical apparatus of the Additive regularization method of thematic models of collections of text documents was first proposed in [18]. Its effectiveness has been proven in processing collections of lexically traditional documents. Its application to the processing of biomedical texts with regularization by biomedical ontologies and thesauri has not been carried out. In addition to this, model development will be carried out on a multilingual collection of medical texts. It is supposed to conduct a study of the significance of various regularization features on the quality of the model (author, place of study, company, information source, type of study and quartile of the publisher). Preprocessing of documents in collections is one of the key components of the developed method and involves the consistent solution of the tasks of content tokenization, filtering and cleaning of stop words, lemmatization of cleared samples and stemming. For effective text mining, it is proposed to include the many knowledge resources available to us in the tokenization pipeline. In the field of evidence-based and personalized medicine, unlike the general field of text mining, we have well-tuned ontologies and knowledge bases. Biomedical ontologies provide an unambiguous characterization of this field of knowledge. The quality of models is likely to increase if existing
Creation of a Medical Decision Support System …
419
ontologies (UMLS, BioPortal, DrugBase) are used as sources of terms for tokenization, to clarify how the concept relates to a named entity from the directory, and also as a way to normalize alternative names and bring them to one identifier. Filtering is supposed to be used to remove words that do not contain semantic content from documents. In addition to the traditional removal of stop words (prepositions, conjunctions, etc.), it is supposed to clear documents from terms that are not related to evidence-based and personalized medicine. For this, on the basis of the corpus of documents available at the Sechenov University Institute of Personalized Medicine Institute, it is supposed to conduct a frequency-statistical analysis of the words used and use it for subsequent filtering. Just as the words found in the texts quite often contain little information to separate documents, and rarely found words do not have significant meaning, both of these are deleted from the documents.
3 Relevance The system analysis of domestic and foreign practice of diagnostics and pharmacotherapy relevance allowed us to identify the following main factors that determine the effectiveness and safety of medical treatment. The first factor is characterized by the practice of prescribing drugs with a low level of evidence of clinical effectiveness. According to expert estimates, the effectiveness of pharmacotherapy in Russia is no more than 60%. According to the health development strategy until 2025, the goal is to reduce the mortality rate in the working age to 380 per 100 thousand people. Currently, the death rate due to medical errors reaches 100 thousand deaths per year in the Russian Federation [http://svpressa.ru/society/ article/63391/]. Even the use of drug therapy based on standards of medical care can lead to inefficiency in 40% of cases due to polypragmasia and individual characteristics of patients. Even the use of treatment standards recommended by the British form, which is the most popular in the world, is ineffective in 20% of patients. In this regard, the possibility of using clinical recommendations along with treatment standards requires the creation and implementation of expert decision support systems based on evidence-based and personalized medicine. The second factor is associated with a significant number of adverse reactions and deaths caused by polypragmasia. Potentially dangerous combinations of drugs are a serious clinical, social and economic problem of the health system and the state as a whole. At the moment, 17–23% of drug combinations prescribed by doctors are potentially dangerous. Although only 6–8% of patients receiving such combinations of drugs develop an adverse reaction; according to expert estimates, up to half a million patients die annually from adverse reactions in the world. The cause of death of a third of them is drug interactions associated with the use of potentially dangerous combinations. In addition, the cost of treating adverse reactions resulting from the use of potentially dangerous combinations is half of the cost of treating all drug complications. Existing domestic and foreign evaluation systems allow us to predict the interaction of only two drugs. In practical medicine, especially in geriatrics, the
420
G. Lebedev et al.
number of drugs used simultaneously is on average close to ten. At the same time, the risk of developing side effects with undesirable drug interactions increases in proportion to the number of drugs used simultaneously. The above two factors determine the third factor-suboptimal circulation of medicines in medical organizations. It is characterized by the fact that procured the drugs with low efficiency, without taking into account their interaction with simultaneous application, on the one hand, leads to an increase of funds spent on the purchase of medicines, increases the number of unclaimed drugs in stock are due to expire prior to their possible application, and on the other hand degrades the quality of patient care and leads to unnecessary trade policy and reduce life expectancy. These three factors lead to an increase in the cost of drug provision and impair the replenishment of human capital. The scientific and technological solutions developed within the project will allow to reduce or even exclude the influence of these factors to create a technological solution for evaluating medical technologies based on evidence-based medicine, to form approaches to scientifically based safe prescription of medicines, especially in the conditions of polypragmasia, and to ensure rational circulation of medicines in medical organizations. Thus, increasing the effectiveness and safety of pharmacotherapy can be achieved along with administrative regulatory and other measures by implementing the proposed scientific and technological solutions based on evidence-based approaches and personalized medicine using artificial intelligence methods, which is an urgent task. Development of solutions for the evaluation of medical technologies, expert system based on the latest achievements of personalized medicine and predicting dangerous interactions of three or more drugs, automation of rational circulation of medicines is an important social, clinical and economic task of health care.
4 Scientific Novelty Currently, the intellectual analysis of biomedical texts and documents is the most important area of research in personalized and evidence-based medicine. Effective and robust methods of searching for precedents and evidence in highly specialized areas of medical practice can lead to the formation of a new scientific direction, fundamentally change the quality of medical care, change the existing scientific and methodological practice of building clinical systems to support medical decision-making. As of 2019, we are not aware of solutions that work on multilingual content. Creation of a national system of intellectual analysis of scientific publications of evidence-based medicine, in order to make informed medical decisions and monitor priority areas of development of preventive and personalized medicine, which will have significant advantages over foreign systems UpToDate, DynaMed Plus in terms of indicators, including the use of the latest data on evidence-based medicine, in terms of its functionality, coverage of various medical specialties. Also, a significant
Creation of a Medical Decision Support System …
421
advantage of the domestic system will be the fact that it will be fully adapted to Russian conditions and will meet all the needs of domestic clinical practice. For this purpose, the system will include domestic clinical recommendations, legal and bylaws of the Ministry of Health of the Russian Federation, which will be available to users if necessary. At the first stage, it is planned to create a functional system for the analysis of scientific publications for the development and writing of original content. The content of the created system will be created taking into account the following features: • To analyze the genetic characteristics and results of the biomarkers to detect predisposition to the development of diseases; • Apply personalized methods of treatment of diseases and correction of conditions, including personalized use of drugs and BCP, including targeted (target-specific), based on the analysis of genetic characteristics and other biomarkers; • Search in information databases for content on patients included in the subpopulation of differentiated diagnosis and application of proven effective treatment regimens for them, which will significantly reduce the time to develop treatment regimens; • Use biomarkers to monitor treatment effectiveness; • To carry out machine monitoring of the latest information on medical databases on search queries of system users for carrying out further clinical researches; • Conduct pharmacokinetic studies to predict the patient’s response to targeted therapy and reduce side effects, especially in situations of cancer and socially significant disabling diseases; • Use built-in integrated medical image processing systems for diagnostic searches through databases containing the results of studies conducted using visual diagnostic methods. Development of special computer programs can operate with numerous electronic data, take into account the complex relationships between various factors and suggest optimal individual solutions to the doctor.
5 Research Problem Within the framework of the stated research issues, the following group of tasks is expected to be solved: • Development of a methodology and formation of a multilingual register of information sources for forecasting trends and prospects for the development of personalized medicine; • Development of technology for parsing information sources and forming data sets of analyzed documents;
422
G. Lebedev et al.
• Development of methods for preprocessing document collections, data normalization using biomedical thesauris and ontologies, taking into account the specifics of personalized medicine; • Development of methods for dynamic multilingual thematic modeling of collections of personalized medicine documents; • Conducting machine learning of a developed multilingual dynamic thematic model based on a collection of personalized medicine documents and evaluating the quality of machine learning on well-known metrics; • Development of methods for the visual representation of the results of thematic modeling, taking into account the specifics of personalized medicine; • Development of appropriate software and its testing in University clinical hospitals of Sechenov University (national system of clinical decision support based on the principles of evidence-based medicine SechenovDataMed—www.datamed.pro); • Popularization of research results, bringing them to a wide range of potential users, including health organizers, managers of medical organizations and practitioners.
6 Expected Results In the first stage, in terms of software solutions will be developed a complex of methods, their algorithmic implementation of the model, conducted machine/computational experiments, carried out quality assessment, confirming their effectiveness and efficiency to achieve the task (structure of the developed system is shown at the Fig. 1). Thematic information corresponding to the user's request
DB (Repository. Single data warehouse)
DBMS
Software implementation (components) of the System
Services and mechanisms of the System: •
machine learning to find similar ones;
•
multimodal mathematical models for search;
•
intelligent search tools for semantic search;
•
natural language query processing;
•
voice search engine;
•
interfaces for operating the system from mobile devices and gadgets;
•
cross-language support;
•
system operation with international ontologies
and thesauruses
Fig. 1 Structure of the reference system
•
Sechenov Evidence Based DataMed-Guides
•
Summary of evidence Sechenov Clinics
•
Primary Health Care Standards
•
Specialized Health Care Standards
•
National clinical Guidelines
•
Criteria for assessing the quality of care
Creation of a Medical Decision Support System …
423
At the end of the first stage of research, we will prove the effectiveness of the developed methods in the processing of biomedical content taking into account all the linguistic specifics. The method of determining sources will be based on multi-criteria methods of bibliographic search, taking into account the significance and ranking of sources. It is supposed to use not only scientific indexed sources of publications, but also the so-called “gray”, but highly reliable area of knowledge, such as patent databases for the relevant sections: • Development of a methodology and formation of a multilingual register of information sources for forecasting trends and prospects for the development of personalized medicine; • Development of technology for parsing information sources and forming data sets of analyzed documents; • Development of methods for preprocessing document collections, data normalization using biomedical thesauri and ontologies, taking into account the specifics of personalized medicine; • Development of methods for dynamic multilingual thematic modeling of collections of personalized medicine documents; • Conducting machine learning developed multilingual dynamic thematic model based on a collection of personalized medicine documents and evaluating the quality of machine learning on well-known metrics; • Development of methods for the visual representation of the results of thematic modeling, taking into account the specifics of personalized medicine. As a result of the software implementation of these methods, it is planned to get the following results for groups of functions. Search organization: • manage search sessions; • work with search history; • work with a personal file of documents and a specialized workbook of user requests. Creating a search query: • • • • • •
selection of search arrays and search fields; edit the query in two modes: expert (individually tailored) and query-by-example; inserting values from the search index; using the search expression creation wizard; clarification of extension of the query terms; translation of query terms into foreign languages.
424
G. Lebedev et al.
Query execution: • control of spelling and syntax errors; • monitoring the integrity of the request based on the classification of biomedical indexes documents; • ability to search for documents in a personal file library; • multilingual search in six languages: Russian, English, Spanish, German, French and Italian; • search for lexically and semantically similar documents; • search for documents using a natural language(s) query. Working with search results in real time: • analysis of search results using frequency distributions of field values in found documents; • study of the thematic structure of search results using cluster analysis of search results. Thematic Association of documents based on the similarity of the text of documents and other significant fields (abstract, formula, etc.); • building a semantic map of the links of key topics of search results for semantic navigation on the links of topics in order to find the most relevant document application, as well as to dive into documents containing these topics and links. The use of biomedical thesauri will allow for meaningful analysis of connections and topics generalized to dominant synsets; • work with the final selection of documents; • search for documents that are similar in content to the selected document; • grouping, exporting and printing search results; • manage settings for displaying search results. Document viewer: • • • • • •
navigation through documents; navigation through highlighted terms relevant to the query; working with graphic objects of documents; translation of documents; search based on the value of the bibliographic field; a code review of genetic sequences. The planned functional results are summarized in Table 1.
7 Conclusion Thus, the project will create the following resources that make up the knowledge base (a semantic network of drug use):
Creation of a Medical Decision Support System …
425
Table 1 Functional results of the study Function
Definition
Description
Natural language search
Search for documents on request in natural language
Extracting lexical terms (LT) from a query using the system’s morphological and syntactic analyzers; calculating the weights of the lexical terms (key themes in canonical form); fuzzy search is performed in the search engine taking into account the weights of the extracted LT
Search for similar biomedical documents
Search for documents whose terminological vectors (TV) are close to the TV of the specified document
For each document at the indexing stage, its semantic representation is built—a semantic network, on the basis of which a list of key topics and other analytical information is also formed, taking into account the features of the documents; searches for similar documents based on a similarity measure
Cluster analysis of search results
Grouping documents into thematic groups (clusters)
Clustering of a collection of documents is based on a measure of similarity
Semantic network
Identification of key topics and their relationships within the document, the formation of a binary semantic network
System components for each document at the indexing stage extract lexical terms and their relationships; lexical terms and links between them are stored in the database
Semantic map
Dynamic integration of semantic networks of search results documents, analysis of cause-and-effect relationships between document topics
Visualization in the form of a semantic map is used for semantic navigation in a collection of documents
• A database containing formalized normative data defining the order of application of pharmacotherapeutic methods, such as standards of medical care and clinical recommendations; • A database containing formalized instructions for the use of medicines, including compliance of medicines with diagnoses, gender and age of patients, contraindications to use, information about incompatibility of medicines, information about the complexity of the use of several drugs;
426
G. Lebedev et al.
• Database containing formalized data of scientific publications of evidence-based medicine with information about the evaluation of medicines and formalized algorithms for calculating their effectiveness; • Computer program, which is an information system to support medical decisionmaking in pharmacotherapy, integrated with the medical information system. The social significance of the project lies in the implementation of the developed solutions in the information systems of medical organizations that allow the circulation of medicines to reduce the labor loss of patients, increase life expectancy, reduce the number of cases of unintentional and premature death, reduce the number of cases of chronization of timely detected diseases. Commercialization of the project will be achieved • through the sale of information and analytical decision support system to medical organizations and pharmacies (1000–1500 medical organizations, 6000 pharmacies); • through the sale of mobile applications information and analytical decision support system for individual doctors, citizens of the Russian Federation, who monitor their health (30,000–50,000 applications); • through subscription access to the knowledge base from other applications for developers of medical information systems (30,000–50,000 licenses). One of the important directions of healthcare development in the world is its digitalization. An analysis of publications in this area will be of great interest to us. For example, telemedicine problems, as in [19, 20].
References 1. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd ed. Pearson, Upper Saddle River (2009) 2. Cochrane Library. https://www.cochranelibrary.com/. Last accessed 19 Jan 2020 3. Federal Electronic Medical Library, Ministry of Health of the Russian Federation. http://www. femb.ru/. Last accessed 19 Jan 2020 4. Simmons, M., Singhal, A., Lu, Z.: Text mining for precision medicine: bringing structure to EHRs and biomedical literature to understand genes and health. In: Shen, B., Tang, H., Jiang, X. (eds.) Translational Biomedical Informatics. Advances in Experimental Medicine and Biology, vol. 939. Springer, Singapore (2019). https://doi.org/10.1007/978-981-10-1503-8_7 5. Ruch, P.: Text mining to support gene ontology curation and vice versa. In: Dessimoz, C., Škunca, N. (eds.) The Gene Ontology Handbook. Methods in Molecular Biology, vol. 1446. Humana Press, New York, NY (2017). https://doi.org/10.1007/978-1-4939-3743-1_6 6. Sayers, E.W., Barrett, T., Benson, D.A., et al.: Database resources of the national center for biotechnology information. Nucleic Acids Res. 2010(38), D5–D16 (2010) 7. Rebholz-Schuhmann, D., Oellrich, A., Hoehndorf, R.: Text-mining solutions for biomedical research: enabling integrative biology. Nat. Rev. Genet. 2012(13), 829–839 (2012) 8. Manning, C.D., Raghavan, P., Schütze, H.: An Introduction to Information Retrieval, 496 p. Cambridge University Press, New York, NY (2008)
Creation of a Medical Decision Support System …
427
9. Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From Data Mining to Knowledge Discovery: An Overview. Advances in Knowledge Discovery and Data Mining, pp. 1–34. AAAI/MIT Press (1996) 10. Feinerer, I., Hornik, K., Meyer, D.: Text mining infrastructure. R. J. Stat. Softw. 25(5), 54 p (2008) 11. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd ed, 800 p. Morgan Kaufmann (2006) 12. Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering. https://www.cs.cmu.edu/~mccallum/bow/. Last accessed 19 Jan 2020 13. DiMeX: A Text Mining System for Mutation-Disease Association Extraction April 2016 PLoS One 11(4) (2016) 14. Thien Huu Nguyen, Kyunghyun Cho and Ralph Grishman: Joint Event Extraction via Recurrent Neural Networks. Computer Science Department, New York University, New York, NY 10003, USA (2016). https://doi.org/10.18653/v1/n16-1034 15. Börner K, Chen C, Boyack KW.: Visualizing knowledge domains. Information Science and Technology. 2003;37 (1): 179–255. 10.1002/aris.1440370106 (2003) 16. Arora, S.K., Porter, A.L., Youtie, J., et al.: Capturing new developments in an emerging technology: an updated search strategy for identifying nanotechnology research outputs. Scientometrics 95, 351 (2013). https://doi.org/10.1007/s11192-012-0903-6 17. Ludo Waltman, Anthony F. J. van Raan, Sue Smart: Exploring the Relationship between the Engineering and Physical Sciences and the Health and Life Sciences by Advanced Bibliometric Methods. PLoS One 9(10), e111530. Published online 2014 Oct 31 (2014). https://doi.org/10. 1371/journal.pone.0111530 18. K. V. Vorontsov: (Reports of the Academy of Sciences, 2014, vol. 456, no. 3, pp. 268– 271-Additive regularization of thematic models of collections of text documents UDC 519.2:004.852, (2014), https://doi.org/10.7868/s0869565214090096) 19. Stankova M., Mihova P.: Attitudes to Telemedicine, and Willingness to Use in Young People. In: Czarnowski I., Howlett R., Jain L. (eds) Intelligent Decision Technologies 2019. Smart Innovation, Systems and Technologies, vol 143. Springer, Singapore (2019). https://doi.org/ 10.1007/978-981-13-8303-8_30 20. Stankova M., V. Ivanova, T. Kamenski: Use of educational computer games in the initial assessment and therapy of children with special educational needs in Bulgaria. TEM J. 7(3), 488–494, ISSN 2217-8309 (2018). https://doi.org/10.18421/tem73-03
Improve Statistical Reporting Forms in Research Institutions of the Health Ministry of Russia Georgy Lebedev, Oleg Krylov, Andrey Yuriy, Yuriy Mironov, Valeriy Tkachenko, Eduard Fartushniy, and Sergey Zykov
Abstract The effectiveness of the research results implementation is one of the main indicators that must be taken into account in the allocation of budget funds for research. The requirement of efficiency of spending budget funds allocated for research works leads to the necessity for continuous improvement of the methodological apparatus of decision support on the allocation of funds, including considering the efficiency of the implementation of previously obtained research results of each certain Research Institution. With the development of information technology, it is important to improve the quality of information support for decision-making on the organization of scientific research in the Ministry of Health, including by improving the system of indicators and criteria for assessing the potential of scientific research institutions, reflecting their ability to achieve the desired results in the implementation of government contracts.
1 Introduction In Russia, the issue of increasing the efficiency of Russian science, the development of clear criteria for assessing its activities, up to a comprehensive reform of the entire system, including financing mechanisms, management techniques and the structure of industrial relations has become more acute recently [1–3]. All this applies equally to both basic and applied science. G. Lebedev (B) · E. Fartushniy · S. Zykov I.M. Sechenov First Moscow State Medical University, 2-4 Bolshaya Pirogovskaya st, 119991 Moscow, Russia e-mail: [email protected] G. Lebedev · O. Krylov · A. Yuriy · Y. Mironov · V. Tkachenko Federal Research Institute for Health Organization and Informatics, 11, Dobrolubova street, 127254 Moscow, Russia G. Lebedev V.I. Kulakov National Medical Research, Center for Obstetrics, Gynecology and Perinatology, 4 Oparin st, 117997 Moscow, Russia © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_36
429
430
G. Lebedev et al.
The objective current necessity to increase the transparency and efficiency of the research process coincided with an important change in the scientific environment: a sufficiently high degree of transfer of scientist’s activities to the electronic online environment has been achieved [4–6]. The collection, processing and interpretation of this activity signs open previously inaccessible opportunities for the formation of a statistical base on the scientist activities [7] and, in particular, on the use of the results of their research. The effectiveness of scientific research (R&D) [8], including the implementation of their results to practice, is one of the main indicators that must be taken into account in the allocation of budget funds for research. The effectiveness of research can be defined as the value of scientific results obtained in the performance of research, and their accordance with the number of resources (material, financial, etc.) spent on their production [9, 10]. The volume of these resources can be considered as the cost of work, namely the amount of funding, within the framework of state assignments or state contracts [1–3] to Science Research Institutions (SRI), which belong to the Federal State Budget Institutions and Federal State Budget Educational Institutions of the Ministry of Health of Russia. Most of the existing methods for assessing the effectiveness of research, and the value of the research results obtained in the performance of research, essentially boil down to a one-time assessment [11]. As a rule, the task of minimizing the cost of ensuring the work is solved, provided that they are unconditionally performed and achieve the planned goal of the research, without taking into account the further implementation of the research results into practice. Thus, the task of developing a method of integrated dynamic assessment of the effectiveness of research is actual, as the assessment of the value of the results may vary during the implementation of the results in practice and development of medical science. It is proposed to assess the effectiveness of R&D over a number of years after its completion, and on the basis of this assessment to form an integral indicator (rating) of the SRI, reflecting the effectiveness of R&D in these SRI.
2 Materials and Methods The paper analyzes the indicators of SRI potentials collected in the framework of existing forms of statistical reporting, and their sufficiency for the assessment of SRI potentials. The analysis of the sufficiency of the specified indicators for an assessment of potentials of SRI reflecting ability to receive demanded scientific results is carried out. At the same time, the method of resource-dynamic modeling of scientific and technical potentials (STP) of the Ministry of Health institutions was used [8]. The methodology presented in the article suggests that the structural model of NTP in a medical institution can be presented in the form: ¯ Ln, t F n (t) = ψ R,
(1)
Improve Statistical Reporting Forms in Research Institutions …
431
n n n n n n n where R= (R nf , Rint , Rin f , Rmt , Rem , Rmd , Rsc u R pr t )—financial, intellectual, information, material, educational and methodical, medical, social, and software and technical resources of n-th SRI; L n —the level of resource management; t—time module; then F n (t) is a sign of generalization of all types of resources and factors in the time period t.
3 Outcomes Recommendations on the addition to the statistical reporting forms of SRI of the Ministry of Health of Russia were developed. A new form of statistical reporting has been developed, including indicators characterizing the effectiveness of the implementation of research results performed by SRI within the framework of state contracts. Using the above method for solving multi-criteria optimization problems, we can • calculate the indicators of the level of scientific institutions NTP; • generalize all kinds of resources and factors of the structural model of the institution NTP in the time range; • calculate a generalized model of budgetary financing of the institution (s) of the Ministry of Health of Russia on scientific platforms (SP); • calculate the basic assessment level of institutions NTP for each SP of the Ministry of Health of Russia, including in the context of educational and scientific institutions, and all SP of the Ministry of Health of Russia; • form a matrix of expert assessments by criteria; • carry out the analysis of the activity of SRI on the basis of the formed matrix of expert estimations on years and periods; • assess the effectiveness of research in the SRI of the Ministry of Health of Russia. The method of calculation of the integral evaluation of R&D efficiency takes into account: • • • • •
criteria for cost-effectiveness, calculated scientometric indicators, dynamics of indicators over a number of years, expert evaluation of the value of the results, expected/confirmed economic impact.
4 Discussion Currently, the Ministry of Health has a reporting system, according to which all subordinate institutions performing research (SRI) annually provide information on the implementation of the state task for research, as well as information on the
432
G. Lebedev et al.
implementation of the action plan to improve the efficiency of scientific and practical activities of institutions. All reporting documentation is divided into two groups. The first group contains 6 forms of reports on the progress of the implementation of the state task for the implementation of scientific research and allows you to assess the effectiveness of each specific scientific work, and includes the following performance indicators: • number of patent applications; • number of patents received, the patentee of which is the SRI; • number of articles, from them: – – – – – • • • • •
in rating domestic journals with an impact factor of at least 0.3; total impact factor of articles; in foreign journals with an impact factor of at least 0.3; total impact factor of articles; miscellaneous.
number of monograph chapters in monographs, manuals, etc.; number the developed drugs, medical devices; number of test systems developed; number of organized and done scientific and practical activities; number of defended dissertations, from them: – doctoral – candidate.
A separate reporting form collects more detailed data on all published articles with reference to the research in which they were written. The second group contains 6 forms of reports on the implementation of the action plan to improve efficiency. This group includes indicators that characterize researchers, scientific units and scientific institutions as a whole, which allow assessing the scientific and technical potential, which is also necessary when using the integrated assessment of research. Thus, the currently available statistical indicators allow us to assess the results of the work of SRI only from the scientific side and partly from the practical, but such aspects as financial and economic, organizational, social, educational and methodical are not presented at all. This leads to both one-sidedness of the assessment itself, and to underestimation of individual R&D aimed at obtaining results in these areas. It should also be noted that the existing forms do not allow assessing the dynamics of the implementation of the results of research, after their completion (i.e., using them after reporting on them). Under the current system, the missing indicators can be obtained only by studying the full reports on research, which is very labor-intensive and time-consuming. It is advisable to reflect all the indicators for assessing the effectiveness of research in a separate report form in tabular form. Additional indicators, if necessary, can be formulated by the expert community.
Improve Statistical Reporting Forms in Research Institutions …
433
By formalizing the presentation of performance indicators, it is possible not only to achieve a significant simplification of work for experts evaluating the effectiveness of research, but also to partially automate the process of evaluating research. From the scientific and practical point of view , the most important component of applied research—the number of indicators—is not complete. In addition, the indicators are presented only in a generalized digital form—not detailed by the obtained practical results. The authors propose to collect data on scientific and practical results in a separate form, compiled by analogy with the reporting form for scientific articles. Such an approach will allow tracking further both efficiency of implementation, and the received financial effect from implementation within a 3–5-year cycle after completion of research and development. The proposed report forms to the Ministry of health of Russia will allow implementing the methodology of integral assessment of the effectiveness of scientific research in scientific institutions of the Russian Ministry of Health. The summary indicators of the implementation of the results of the research are shown in Table 1. Displayed equations are centered and set on a separate line. Table 1 Summary table of indicators of implementation of research results
434
• • • • • • • • •
G. Lebedev et al.
medical technology, drug, medical device, biomedical materials, diagnostic system, methods, social change, organizational and structural changes and other “practical products”, from the sale or implementation of which can be obtained economic profit.
As a result of the accumulation of this information, it is possible to further assess the economic performance of applied research in a particular institution by year or to compare the performance of individual institutions for a particular year.
5 Summary The paper analyzes the indicators of SRI potentials collected in the framework of existing forms of statistical reporting, and their sufficiency for the assessment of SRI potentials. It is shown that these statistical indicators are not enough for a complete and reliable assessment of the SRI potentials, reflecting the ability to obtain popular scientific results. Recommendations on the addition to the statistical reporting forms of SRI of the Ministry of Health of Russia were developed. The indicators included in the proposed new forms of statistical reporting of SRI of Ministry of Health of Russia will allow to significantly improve the quality of solving the problem of distribution of budgetary funds on the execution of state contracts by taking into account the implementation of the results of the earlier performed research, tracked over several years.
References 1. Selby, J.V., Lipstein, S.H.: PCORI at 3 years—progress, lessons, and plans. N. Engl. J. Med. 370(7), 592–595 (2014) (13 Feb 2014) 2. Sully, B.G., Julious, S.A., Nicholl, J.: A reinvestigation of recruitment to randomized, controlled, multicenter trials: a review of trials funded by two UK funding agencies. Trials. 14, 166 (2013) (9 Jun 2013) 3. Arnold, E., Brown, N., Eriksson, A., Jansson, T., Muscio, A., Nählinder, J., et al.: The Role of Industrial Research Institutes in the National Innovation System. VINNOVA, Stockholm (2007) 4. Sveiby, K-E.: What is Knowledge Management? (2001). https://www.sveiby.com/Articles? page_articles=2. Accessed 31 Jan 2019 5. Anderson, M.L., Califf, R.M., Sugarman, J.: Participants in the NIH Health Care Systems Research Collaboratory Cluster Randomized Trial Workshop. Ethical and regulatory issues
Improve Statistical Reporting Forms in Research Institutions …
6.
7. 8.
9. 10.
11.
435
of pragmatic cluster randomized trials in contemporary health systems. Clin. Trials. 12(3), 276–286 (2015). https://doi.org/10.1177/1740774515571140 Frewer, L.J., Coles, D., van der Lans, I.A., Schroeder, D., Champion, K., Apperley, J.F.: Impact of the European clinical trials directive on prospective academic clinical trials associated with BMT. Bone Marrow Transplant. 46(3), 443–447 (2011) Warlow C. A new NHS research strategy. Lancet. 2006 Jan 7; 367 (9504): 12–13 (2006) Lebedev, G., Krylov, O., Lelyakov, A., Mironov, Y., Tkachenko, V., Zykov, S.: Scientific Research in Scientific Institutions of Russian Ministry of Health. Smart Innovation, Systems and Technologies (June 2019) VanLare, J.M., Conway, P.H., Sox, H.C.: Five next steps for a new national program for comparative-effectiveness research. N. Engl. J. Med. 362(11), 970–973 (2010) (18 Mar 2010) Higher Education Funding Council for England 2017 Research Excellence Framework [updated 2017 March 17]. Assessment criteria and level definitions [updated 2014 Dec 12]. https://www.ref.ac.uk/2014/panels/assessmentcriteriaandleveldefinitions/. Accessed 31 Jan 2019 Califf, R.M.: The patient-centered outcomes research network: a national infrastructure for comparative effectiveness research. N. C. Med. J. 75(3), 204–210 (2014) (May–Jun 2014)
Chat-Based Approach Applied to Automatic Livestream Highlight Generation Pavel Drankou and Sergey Zykov
Abstract Highlight generation and subsequent video production processes are expensive when it is done by humans. The paper shows how the process can be automated. It defines the highlight generation problem, suggests and discusses five different approaches for solving this problem. Statistics-based approach is discussed separately in deep detail along with algorithm implementation elements. Web-based highlight generation and video production service requirements and architecture are identified and discussed, respectively.
1 Introduction During the last decade, live broadcasting services such as Twitch.tv have gained immense popularity. According to the Alexa Internet rating [1], Twitch.tv is in the 35th place among most popular sites worldwide. The fact signalizes a massive change in how people spend their time and humanity entertainment [2] overall. But what are the reasons behind people watching livestreams? One of them is the ability to watch a highlight: the most exciting and interesting part of a livestream. Another thing is the highlight industry and video production. There are people whose job is to track streamer channels for highlight occurrences and produce compilation videos, composed of a few best moments. Videos of this kind are commonly viewed hundreds of thousands of times on YouTube. But, despite the popularity of highlight videos, the cost of resources required to produce a compilation video frequently outweighs the possible benefits. A producer should spend his time watching a stream, then upload recorded highlights and, finally, mount them together and publish; it is expensive. P. Drankou (B) · S. Zykov (B) National Research University Higher School of Economics, 20 Myasnitskaya St., 110000 Moscow, Russia e-mail: [email protected] S. Zykov e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_37
437
438
P. Drankou and S. Zykov
The question is how can production process be made in a way more affordable for individuals, utilizing modern tools, approaches and best practices? The general answer is through automation. Highlight generation, compilation and video publication processes can be automated. The first one is the most nontrivial among the trio. This paper defines possible approaches to solve automatic highlight generation problem and concentrates on statistics based one.
2 Background The computer gamers can be divided into two groups: the ones who are good at gaming, and the ones who can get better by learning from them. Justin.tv, a generalinterest streaming platform launched in 2007, became a perfect medium between the two mentioned groups. Four years later, in 2011 the company tried to focus more on gaming and made a decision to spin off Twitch.tv as a gamer-oriented product [3]. The impressive popularity of the games-only product eclipsed Justin.tv, and, in August 2014 the company decided to close the general-interest-oriented platform. In the same month, Twitch.tv had been acquired by Amazon [4]. In 2020 Twitch.tv still holds the title of number 1 livestream platform on Earth with 15 million unique daily viewers [5]. The highlight industry has been existing even before livestream platforms and full-time broadcasters. With the launching of ESPN in 1979, video highlights have supplanted sports magazines and radio [6]. It becomes easy to follow sports and its stars. And in 2005, when YouTube was launched, hosting highlight videos on the platform became mainstream. And even now, YouTube holds the first place in the field. Such YouTube channels as Mrbundesteamchannel [7] (football highlights) with 1.45 millions of subscribers, Trolden [8] (Hearthstone highlights) with 0.9 millions of subscribers and others attract millions of views every day (Fig. 1). The
Fig. 1 mr bundesteam YouTube channel
Chat-Based Approach Applied to Automatic Livestream …
439
mentioned YouTube channels are owned by individuals, whose primary activity is the production process. Most of the full-time Twitch streamers can afford to spend their time on it. Instead, they outsource a production process on their YouTube channels to qualified specialists in the field. Channels such as Kibler [9] with 431 K subscribers, Kripparrian [10] with 958 K subscribers and Amaz [11] with 457 K subscribers are examples of outsourced channels.
3 Related Work In 1978, Kochen and Manfred predicted [12] rise of computer-mediated communications (CMC) as a “new linguistic entity with its own vocabulary, syntax, and pragmatics.” In 2004, Zhou et al. [13] applied automated linguistic-based analysis of text-based forms of communications and achieved a positive result: “some newly discovered linguistic constructs and their component LBC were helpful in differentiating deception from truth”. Ford et al. [14] discovered the transformation of communication patterns in massive large-scale interactions: “Due to the large scale and fast pace of massive chats, communication patterns no longer follow models developed in previous studies of computer-mediated communication”. They have introduced two groups of metrics for the study: primary (scroll rate, message length, chat content, voices) and secondary(word count per message, unique word count per 50-message segment, participant count per segment).
4 The Problem and Approaches in General Understanding the problem requires deep context understanding. A diagram and description of the main Twitch.tv architecture [15] components would significantly contribute to the context understanding. And that is the exact reason why the article is continued with that.
4.1 Twitch.tv Architecture In general, Twitch consisted of 5 essential parts (Fig. 2): 1. 2. 3. 4. 5.
Streamers, Video broadcast acceptor servers, Video broadcast distributor servers, IRC servers and Clients.
440
P. Drankou and S. Zykov
Streamer Streamer Streamer Video stream
Acceptor server
Twitch
stream replication Receive messages
Distribution server
IRC server
Video stream
Client
Client
Send messages
Fig. 2 Twitch architecture diagram
4.1.1
Streamer
Is an individual organizer of a broadcast. The one uses broadcast software, such as OBS [16], in order to deliver RTMP [17] encoded, livestreamed video right to the Twitch acceptor server.
4.1.2
Video Broadcast Acceptor Servers
Are responsible for accepting video content, received from a streamer, and replicating it to the distribution servers. Since RTMP [17] format of incoming stream is not supported by browsers, the acceptor server performs transcoding into HLS [18] format. The transcoded stream is sent to the distribution servers.
4.1.3
Video Broadcast Distributor Servers
Are the key elements in the chain of content delivery. Their goal is being scalable and powerful enough to deliver a video stream content to the millions of clients. The clients video stream requests are responded with, transcoded from RTMP, HLS responses.
Chat-Based Approach Applied to Automatic Livestream …
4.1.4
441
IRC Servers
Are the key elements in the Twitch.tv architecture. Every livestreamer on the platform has their own dedicated chat for viewers’ communication. Those chats are implemented on top of IRC [19] protocol.
4.1.5
Clients
Are the humans who are looking for live content to watch. They participate in chat discussions by reacting on what is happening on a livestream.
4.2 The Problem Definition The problem statement is as follows: given a video content stream and crowd-based reaction to this content (presented as a text), identify the stream fragments that interest the audience most.
4.3 Suggested Approaches This section suggests five different approaches to the highlight generation problem. The implementation elements of the first one are discussed later in the paper.
4.3.1
Text Chat Statistics-Based Analysis
One of the most evident approaches to the problem defined above might be a statistical analysis of the users’ reaction in a text chat. And, in fact, the approach is one of closest to the highlight problem by nature. The idea is to keep a subscription to the IRC chat server and continuously calculate statistic-based metrics such as frequency of messages and react properly on metric changes. Comparing this approach to others in term of wickedness, the algorithm implementation for this particular approach is closest to a tamed [20] problem.
4.3.2
Video Content Analysis Approach
The exact opposite of the statistics-based approach is video content analysis. This approach utilizes powerful computer vision techniques in order to achieve object recognition and classification. A composition of a few specific objects in a given position might be predefined as a highlight situation.
442
P. Drankou and S. Zykov
This approach might sound as a decent one, but it has a few significant drawbacks: • • • •
High implementation complexity; Composition definitions are required per each broadcast; Serious competency requirements; Require deep context knowledge.
These drawbacks might cause significant inaccuracies in the highlight generation algorithm results, that is why the mentioned approach should be considered as a supplementary one.
4.3.3
Semantic Text Chat Analysis Approach
With text message classification, this it is completely possible to determine the emotional shade of a message. The only problem is that a message has to be large enough, otherwise the classification algorithm would not produce accurate results. Considering the average chat message length in a Twitch chat, which is close to 20 symbols, it is not reasonable to attempt this approach without paying attention to the ones mentioned above.
4.3.4
Domain-Specific Information Analysis Approach
Having additional sources of domain-specific information about what is happening on a stream is also an approach to solve the highlight problem. Consider a live broadcasted fight show. If we would have the coordinates of the hands of fighters during a boxing match, we would easily determine whether a highlight can happen. The only disadvantage is non-universality: the broadcast you apply with this approach should provide a good enough source of domain-specific data, which is also unique in every specific case.
4.3.5
A Combination of Mentioned Approaches
As it can be noticed, the approaches listed above can be combined with an eye to create a high accuracy algorithm. The suggestion is to implement them independently and make highlight decisions based on composed algorithm results.
5 Frequency Statistics-Based Approach This section is dedicated to an attempt to implement an algorithm based on “Text chat statistics-based analysis” approach, mentioned in the previous section. The algorithm is going to use three statistics metrics, inspired by Ford et al.’s [14] work:
Chat-Based Approach Applied to Automatic Livestream …
443
Fig. 3 Message per seconds in Twitch.tv/sodapoppin
Fig. 4 Amount of unique words in a time frame. Time frames are 5, 10, 15, 20, 25 and 30 seconds, respectively
• Frequency of messages per second; • Average message length in a period of time—LenIndex; • Amount of unique words among all the messages in a period of time— UniqWordIndex. In order to capture a highlight, an algorithm should continuously track and measure the changes in metric values. But, what kind of changes should the algorithm look for? An extreme change in the message frequencies or a huge decrease of an average message length value? The response to those questions can be obtained via real observations. Here is how a typical highlight looks on a graph captured at Sodapoppin Twitch channels while the streamer was being watched by 21,700 users on January 29(UTC+3 time zone) (Fig. 3). There is a highlight at 01:03:40, and as it might be noticed the amount of messages per seconds is starting a rapid growth at that point until 01:08:00. And here are the values for LenIndex and UniqWordIndex metrics calculated at the same time frame (Figs. 4 and 5).
444
P. Drankou and S. Zykov
Fig. 5 Average message length in a time frame. Time frames are 5, 10, 15, 20, 25 and 30 seconds, respectively
The LenIndex and UniqWordIndex metrics also show us a similar picture but in the opposite direction. Instead of growth, we observe a significant fall in the message uniqueness and length after the highlight occurrence. As a conclusion for the section, we have been using three essential metrics for the highlight detection algorithm based on statistical data approach and empirically synthesized the trends of metrics changes, which might easily become a basis for a high accuracy highlight detection algorithm.
6 Implementing a Highlight Generator As it is promised in the introduction, in this sections we are going to discuss the architecture of a highlight generator service with the following functionalities: • Ability to perform real-time highlight generation on a Twitch.tv stream; • Compilation video mounting; • Video publication on YouTube. In order to implement the given functionalities, we are supposed to lay down fundamental principles for the service [21]. Firstly, a highlight generation algorithm should be replaceable and designed in a way to be easily extensible. Secondly, compilation video mounting should be scalable; if a client requested a video mounting process to start, the process should not affect the others. And finally, the whole system components are expected to work independently. This architecture diagram shows the system design (Fig. 6).
Chat-Based Approach Applied to Automatic Livestream …
445
Twitch
Video
Distribution server
Ch
a re at
highlights
uses
Web interface
c
report
Highlight tracker
IRC server
n tio
Ask
Video compiler
pro to
d uc
upload and publish video
YouTube
e
Highlight generator
Client Fig. 6 Highlight generator architecture diagram
6.1 Web Interface Is a JavaScript app designed to assist the end-user in video publication. It obtains highlights, shows them to a user. He is able to preview each highlight, select the best of them and submit a publication task.
6.2 Video Compiler Is a Java application which receives a set of highlight id’s to create a compilation from, download them and store in a local file system. Once downloads are finished, ffmpeg [22] is used for the final video rendering. The resulting video is uploaded to YouTube using the Data API [23].
446
P. Drankou and S. Zykov
6.3 Highlight Tracker Is also an application written in Java. It connects to the Twitch IRC servers, joins the tracked chat rooms and passes received messages in real time to the decision engine. The engine embeds the described “Frequency statistics-based” algorithm in order to find the right moment to capture a highlight video.
7 Conclusion The paper showed how highlight generation and subsequent video production processes can be automated. The highlight generation problem is defined, five different approaches for solving this problem are suggested and discussed. Statistics-based approach has been discussed separately in deep detail along with algorithm implementation elements. Web-based highlight generation service requirements and architecture are identified and discussed, respectively.
References 1. The Top 500 Sites on the Web. https://www.alexa.com/topsites 2. Hamilton, W.A., Garretson, O., Kerne, A.: Streaming on twitch: fostering participatory communities of play within live mixed media. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp. 1315–1324 (2014) 3. Wikipedia Contributors: Twitch (service)—Wikipedia, the Free Encyclopedia (2020). https:// en.wikipedia.org/w/index.php?title=Twitch_(service)&oldid=944407516. Accessed 9 Mar 2020 4. Weinberger, M.: Amazon’s 970 million purchase of twitch makes so much sense now: it’s all about the cloud, Mar 2016. https://www.businessinsider.com/amazons-970-million-purchaseof-twitch-makes-so-much-sense-now-its-all-about-the-cloud-2016-3 5. Twitch revenue and usage statistics in 2019, Feb 2019. https://www.businessofapps.com/data/ twitch-statistics/ 6. Gamache, R.: A History of Sports Highlights: Replayed Plays from Edison to ESPN. Incorporated, Publishers, McFarland (2010). https://books.google.ru/books?id=Q1rpmAEACAAJ 7. Mrbundesteamchannel: Mr Bundesteam Youtube Channel. https://www.youtube.com/user/ mrbundesteamchannel 8. Trolden: Trolden Youtube Channel. https://www.youtube.com/user/Trolden1337 9. Kibler, B.: Brian Kibler Youtube Channel. https://www.youtube.com/channel/ UCItISwABVRjboRSBBi6WYTA 10. Kripparrian: Kripparrian Youtube Channel. https://www.youtube.com/user/Kripparrian 11. Amazhs: Amaz Youtube Channel. https://www.youtube.com/user/amazhs 12. Kochen, M.: Long term implications of electronic information exchanges for information science. Bull. Am. Soc. Inf. Sci. 4(5), 22–3 (1978) 13. Zhou, L., Burgoon, J.K., Nunamaker, J.F., Twitchell, D.: Automating linguistics-based cues for detecting deception in text-based asynchronous computer-mediated communications. Group decis. Negot. 13(1), 81–106 (2004)
Chat-Based Approach Applied to Automatic Livestream …
447
14. Ford, C., Gardner, D., Horgan, L.E., Liu, C., Tsaasan, A., Nardi, B., Rickman, J.: Chat speed op pogchamp: practices of coherence in massive twitch chat. In: Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems, pp. 858–871 (2017) 15. Twitch engineering: an introduction and overview. https://blog.twitch.tv/en/2015/12/18/ twitch-engineering-an-introduction-and-overview-a23917b71a25/ 16. Obs Studio. https://obsproject.com/ 17. Wikipedia Contributors: Real-time Messaging Protocol—Wikipedia, the Free Encyclopedia (2020). https://en.wikipedia.org/w/index.php?title=Real-Time_Messaging_Protocol& oldid=941761686. Accessed 9 Mar 2020 18. Wikipedia Contributors: Http live Streaming—Wikipedia, the Free Encyclopedia (2020). https://en.wikipedia.org/w/index.php?title=HTTP_Live_Streaming&oldid=942389731. Accessed 9 Mar 2020 19. Wikipedia Contributors: Internet Relay Chat—Wikipedia, the Free Encyclopedia (2020). https://en.wikipedia.org/w/index.php?title=Internet_Relay_Chat&oldid=941834284. Accessed 9 Mar 2020 20. Rittel, H.W., Webber, M.M.: Dilemmas in a general theory of planning. Policy Sci. 4(2), 155– 169 (1973) 21. Spencer, D.: A practical guide to information architecture. Five Simple Steps Penarth, vol. 1 (2010) 22. Ffmpeg. https://www.ffmpeg.org/ 23. Youtube Data API | Google Developers. https://developers.google.com/youtube/v3
Decision Technologies and Related Topics in Big Data Analysis of Social and Financial Issues
What Are the Differences Between Good and Poor User Experience? Jun Iio
Abstract In the 2010s, concepts of both the human-centered design (HCD) and user experience (UX) have proved to be significant in providing efficient systems and services. Additionally, many methodologies to improve UX and usability are being proposed. However, UX is basically a complicated concept, and its design process requires not only engineering skills but also factors such as psychological knowledge, social understanding, and human-behavior consideration among others. To acquire the necessary abilities to realize a practical UX design, the differences between good UX and poor UX must be understood. In this study, a survey was conducted by collecting the opinions of university students. The questionnaire asked them to mention several examples of systems/services that provided good UX or poor UX. The results imply what types of services are liked and disliked by students. Therefore, this study would be informative for developers and designers who plan to provide their services for the young generation.
1 Introduction The human-centered design (HCD) is a concept of the design process, and it is standardized by the International Organization of Standardization (ISO) as ISO 9241-210. Although the HCD process was previously standardized as ISO 13407 in the year 1999, it was integrated into ISO 9241 in the year 2010 (ISO 9241-210:2010), adding the concept of user experience (UX). Furthermore, it was updated to ISO 9241210:2019 in the year 2019 [1]. The HCD standard is a process standard; i.e., the standard defines several processes to realize an efficient design from the viewpoint of a user. It is explained using the general phrases of the HCD process: 1. Specify the context of use; 2. Specify requirements; 3. Create a design solution and development; and 4. Evaluate products J. Iio (B) Chuo University, Shinjuku-Ku, Tokyo 162-8478, Japan e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_38
451
452
J. Iio
Fig. 1 Several types of UX are categorized into four types, namely, anticipated UX, momentary UX, episodic UX, and cumulative UX
(systems/services) [2]. An important point is that this cyclic process must be iterated several times to satisfy the service level of the user requirements. In the recent decade, UX has become very popular, and it is being studied in various domains [3, 4]. Notably, the concept of UX is a more comprehensive idea than HCD, which focuses only on the process of designing systems/services; however, UX covers the entire lifetime of using those systems/services. As the timeline advances, UX is categorized into the following four types [5] (see Fig. 1): – Anticipated UX: Before encountering a system/service, users become aware of it, thereby increasing their expectations from that system/service. – Momentary UX: Users form a positive impression about some functions of the system/service. – Episodic UX: Combining several series of satisfaction from momentary UXs, users get positive episodes. – Cumulative UX (Remembered UX): Multiple positive episodic UXs make the attitude of users positive toward the system/service. Finally, the users start regarding the system/service reliable with its long-term use. In addition to the case of HCD, the UX design (UXD) requires a broader knowledge domain rather than only engineering knowledge. For instance, for designing the anticipated UX process, one cannot refer to the existing system/service because the process must be designed before disclosing the system/service to the public. Therefore, the UXD process requires not only engineering skills but also factors such as psychological knowledge, social understanding, and human-behavior consideration. Before describing my work in detail, the definitions of good UX and poor UX should be indicated. Good UX is a series of UX welcomed by the users. On the other hand, poor UX is not; that is, if a user encounters some experiences where his or her mindset changes to bad, the experiences are decided as poor UX.
What Are the Differences Between Good and Poor …
453
In this study, to understand what are good-UX examples and what are not, a questionnaire was prepared. The literature survey is described in Sect. 2 and the survey methodology in Sect. 3. The results and discussion are described in Sect. 4. Finally, I conclude this study with a summary and future work.
2 Related Work Many UX-evaluation-based studies have been performed thus far. Vermeeren et al. [6] presented the results of their long-term effort of collecting UXevaluation methods, not only from academia but also from the industry. They finalized 96 methods by employing various approaches such as literature reviews, workshops, online surveys, and discussions by specialists. Lachner et al. [7] also attempted to define the quantify of the UX of products. They employed an approach like that of Vermeeren; i.e., they surveyed 84 UX-evaluation methods from the literature and interviews with experts from both academia and industries. Based on the survey, they proposed a tool to measure and visualize the quantity of UX. Hinderks et al. [8] developed a UX key performance indicator for organizational use on the basis of a user-experience questionnaire (UEQ) [9]. Their KPI was called the UEQ KPI, and they performed a large-scale evaluation using 882 participants to prove that their KPI was effective. Kujala et al. [10] proposed another approach to evaluate the long-term UX, i.e., cumulative UX or remembered UX. They proposed “UX curve,” which assists users in explaining how their minds have changed with their experience upon using a service or product over time. Linder and Arvola [11] adapted the interactive phenomenological analysis (IPA) to perform the professional UX-research practice. IPA is an in-depth qualitativeresearch method developed in psychology. They confirmed its availability by testing it for the case of understanding how newly arrived immigrants to Sweden experience a start-up service that introduces them to the job market. Compared with these studies, this study does not aim to define any criteria. My approach is inductive; i.e., I consider that many informative cases offer some solutions for providing a well-designed UX.
3 Method of This Study A simple questionnaire was presented at the end of my lecture titled Interaction with Information Systems, December 23, 2019.
454
J. Iio
3.1 Lecture on HCD and UX The lecture was conducted as one of the series of lectures in an omnibus form by faculty members. The course is provided in the second semester for first-grade students. It is a compulsory course, and the number of participants is approximately 150. The core contents of the lecture were to discuss the principle of the interaction between information systems and users, a short history of human–computer interaction, interactional design, an introduction of HCD and UX, several examples of HCD processes, among others.
3.2 Questionnaire As mentioned previously, the questionnaire was conducted as a homework for the lecture. The questions were simple, which are as follows. Select a few practical examples that provide satisfactory UX and some that provide poor UX from information systems familiar to you. Subsequently, explain why a system provides a good UX and why the other one does not provide good UX. The questionnaire was presented to students via a learning management system (LMS). As depicted in Fig. 2, the answer could be written as a free description. However, an example of the appropriate answering format was indicated, and the answer was expected in that format. Notably, the questionnaire was prepared in the Japanese language, as depicted in Fig. 2; in addition, the answers were recorded in the Japanese language, as all the students were Japanese. However, the discussions described in the following section were based on my translation from Japanese to English.
Fig. 2 Screenshot of the online survey system of LMS
What Are the Differences Between Good and Poor …
455
3.3 Methodology of Analyses Because the answers were freely described, I had conducted a coding process prior to performing the practical analysis. The coding process was conducted as follows. 1. The answer was separated into two parts; the first part was the description of good UX and the second part was that of a poor UX. However, the answers that contained plural descriptions had to be separated into other answers. 2. The description of the system/service was extracted. The rest of the part included the reason why the system/service was associated with good or poor UX. 3. Ambiguities were still present in the descriptions extracted from the responses. For example, “XXX system,” “XXX System,” and “System-XXX” should be considered as the different representations of the same system. Therefore, a code phrase according to the description was assigned to every answer. Furthermore, a sub-code was assigned, where necessary. 4. Furthermore, two flags (Yes or No) were included; the first flag was whether the description mentioned is a specific system or not. The other flag was whether the description mentioned an information system or not. Although the questionnaire instructed students to reply about their UX on information systems, some of the students did not follow the instruction, and they freely described something except information systems. Figure 3 illustrates a part of the responses arranged in a spreadsheet. The left half shows the answers that describe the examples of good UX, and the right half shows the answers that describe the examples of poor UX. Moreover, the first column represents the flags of whether the description mentioned an information system or not, and the second one represents the flags of whether the description mentioned a specific system or not. The seventh and eighth columns denote the same flags in the case of poor UX.
Fig. 3 Screenshot of the spreadsheet focusing on the coding procedure
456
J. Iio
4 Results and Discussion The number of responses was 129. However, the number of cases on good UX and that of cases on poor UX was 136, as some respondents replied with plural answers.
4.1 Top 10 of Good UX and Poor UX Tables 1 and 2 present the top 10 rankings of the systems/services that provided good UX and those of the systems that provided poor UX, respectively. The summation of numbers shown in Table 1 is 61 out of 136, and that in Table 2 is 35. Those imply that the variety of cases considered poor UX ranged wider than that of good UX. In both the tables, the keywords shown with the asterisk (*) represent abstract concepts, such as Netshopping and biometrics. However, the keywords without the asterisk represent specific services such as Amazon and YouTube. Notably, the numbers inside the parentheses at the end of descriptions represent the number of responses indicated by the respondents. Table 1 Top 10 of the good-UX systems/services Rank
Good-UX systems/services
1st
LINE (18)
2nd
Coke ON (6)
3rd
Biometrics*, Amazon (5)
5th
Netshopping*, search engine*, YouTube, Google (3)
9th
Transportation guidance application*, CookPad, face recognition*, iPhone, iOS, Uber Eats, Twitter, Amazon Prime Video, Amazon Prime members (2)
Table 2 Top 10 of the poor-UX systems/services
Rank
Poor-UX systems/services
1st
Netshopping* (7)
2nd
Nico-nico Video/Nico-nico Live, LINE (6)
4th
News site* (5)
5th
Poorly designed website*, YouTube (4)
7th
Website with AD*, input form*, electronic money*, SNS* (3)
What Are the Differences Between Good and Poor …
457
4.2 Notable Answers The most popular application offering good UX was LINE,1 which was supported by 18 votes from the 129 respondents. LINE is currently the most popular messaging application in Japan, especially very popular among the young generations. Surprisingly, 14% of the respondents indicated that LINE provides good UX. However, six respondents mentioned of poor UX upon using LINE (see Table 2). The abovementioned statistics imply that the young generations generally use LINE every day, and that using the LINE application had previously been a part of their daily lives. Hence, they feel both good UX and poor UX while using the LINE application. Although adults might not be familiar with the smartphone application named “Coke ON,” it is very popular among students. It is a type of a stamp-rally application; i.e., the users get a free drink upon collecting Coke ON stamps. The service of the Coke ON application is very simple and easy to understand; therefore, it earned six votes in the questionnaire. Nico-nico Video and Nico-nico Live streaming services were listed as poor-UX providers. Especially, they were compared with other video-streaming services such as Amazon Prime Video. Their service quality was relatively inferior to that of other similar services. Therefore, they were considered as poor-UX providers. At the seventh place of poor-UX rankings, websites with advertisements were listed. Not only the ranking, but also the advertisements were disliked by the respondents. However, YouTube was relatively favored except for its function of mandatory advertisements. (Some students listed YouTube as a poor-UX providing service because of its advertising functions). Two keywords, namely, biometrics and face recognition, in Table 1 represent the privacy-protection function of smartphones. Additionally, some students mentioned that the security gates with face-recognition feature must be installed at the entrance of our campus (see Fig. 4).
4.3 Discussions of the Results Table 3 lists the number of answers mentioning an information system, and the number of answers mentioning other systems/services. Table 4 lists the number of answers mentioning specific systems/services, and the number of answers describing abstract concepts. Table 3 suggests that approximately 15% of the responses did not mention information systems. It is because the examples explained in the lecture were not only based on information systems. Although the instructions of the questionnaire allowed to select some examples of information systems, students could not understand the instructions exactly. 1 https://line.me/en/.
458
J. Iio
Fig. 4 The security gates implemented at the entrance of Ichigaya-Tamachi campus of Chuo University
Table 3 Number of answers mentioning an information system, and the number of answers mentioning other systems/services
ISa
Others
Total
Good UX
115
21
136
Poor UX
117
19
136
232
40
272
Total a Information
Table 4 Number of answers mentioning specific systems/services, and the number of answers describing abstract concepts
system
Specific
Abstract
Total
Good UX
94
42
136
Poor UX
58
78
136
152
120
272
Total
The data in Table 4 presents an interesting human behavior. The ratio of the numbers describing specific systems to the total numbers is different between the group of answers on good UX (94/136 = 0.69) and those on poor UX (58/136 = 0.43). In the case of writing about something that offers poor UX, students tend to hesitate to indicate a practical system/service, resulting in the difference that is evident in Table 4.
5 Conclusions and Future Work Recently, many UX-based studies have been conducted. However, the concept of UX is complicated, and to design a UX efficiently is considered slightly difficult. To comprehend the key issues that make systems and services offer good or poor UX, we must understand the thoughts of users on what the good UX and poor UX are.
What Are the Differences Between Good and Poor …
459
In this study, a questionnaire was prepared, and a survey was conducted to understand what respondents consider while adjudging a system as a good-UX provider or poor-UX one. The first-grade students of our university participated in the survey. A total of 136 responses were collected from 129 participants. Using the result of the questionnaire, it was revealed that some services were liked by young people for their good UX, while some services were disliked by them. More in-depth analysis focusing on the comments describing why they think it as good or poor UX in the results of the survey will be needed. Furthermore, conducting another survey for respondents with other socio-demographic attributes, such as high-school students, adult persons, and senior generations, will be focused as my future work.
References 1. ISO 9241-210:2019 Ergonomics of human-system interaction—Part 210: Human-centred design for interactive systems. https://www.iso.org/standard/77520.html. Accessed 9 Mar 2020 2. User-centered design, From Wikipedia, the Free Encyclopedia. https://en.wikipedia.org/wiki/ User-centered_design. Accessed 9 Mar 2020 3. Nacke, L. E., Mirza-Babaei, P., Drachen, A.: User experience (UX) researchin games. In: Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, Glasgow, Scotland, UK (2019). https://doi.org/10.1145/3290607.3298826 4. Alenljung, B., Lindblom, J., Andreasson, R., Ziemke, T.: User Experience in Social HumanRobot Interaction. Int. J. Ambient Comput. Intell. 8(2), 12–31 (2017). https://doi.org/10.4018/ IJACI.2017040102 5. Marti, P., Iacono, I.: Anticipated, momentary, episodic, remembered: the manyfacets of user experience. In: Proceedings of the Federated Conference on Computer Science and Information Systems, vol. 8, pp. 1647–1655. ACSIS (2016). ISSN 2300–5963. https://doi.org/10.15439/ 2016F302 6. Vermeeren, A.P.O.S, Effie Law, E.L.C., Roto, V., Obrist, M., Jettie Hoonhout, J., Kaisa Väänänen-Vainio-Mattila, K.: User experience evaluation methods: current state and development needs. In: Proceedings of NordiCHI 2010, Reykjavik, Iceland (2010) 7. Lachner, F., Naegelein, P., Kowalski, R., Spann, M., and Butz, A.: Quantified UX: towards a common organizational understanding of user experience. In: Proceedings of NordiCHI 2016, Gothenburg, Sweden (2016). https://doi.org/10.1145/2971485.2971501 8. Hinderks, A., Schrepp, M., Mayo, F.J.D., Escalona, M.J., Thomaschewski, J.: Developing a UX KPI based on the user experience questionnaire. Comput. Stand. Interfaces 65, 38–44 (2019). https://doi.org/10.1016/j.csi.2019.01.007 9. Schrepp, M., Hinderks, A., Thomaschewski, J.: Construction of a benchmark for the user experience questionnaire (UEQ). Int. J. Interact. Multimed. Artif. Intell. 4(4), 40–44 (2017). https://doi.org/10.9781/ijimai.2017.445 10. Sari Kujala, S., Roto, V., Väänänen-Vainio-Mattila, K., Karapanos, E., Sinnelä, A.: UX curve: a method for evaluating long-term user experience. Interact. Comput. 23, 473–483 (2011). https://doi.org/10.1016/j.intcom.2011.06.005 11. Linder, J., Arvola. M.: IPA of UX: interpretative phenomenological analysis in a user experience design practice. In: Proceedings of ECCE 2017—European Conference on Cognitive Ergonomics, Umeå, Sweden (2017). https://doi.org/10.1145/3121283.3121299
Two-Component Opinion Dynamics Theory of Official Stance and Real Opinion Including Self-Interaction Nozomi Okano, Yuki Ohira, and Akira Ishii
Abstract We extend the theory of opinion dynamics, which introduces trust and distrust in human relationships in society, to multiple components. The theory is made up of N components, and in particular, here, two components are treated. In this paper, we show the calculation for real opinion and official stance as the two components of opinions.
1 Introduction Large differences between people’s expressed opinions and inner thoughts sometimes can create discontent and tension and they are associated with serious social phenomena, such as the fall of the Soviet Union [1], as Ye et al. [2] indicate. There exist other phenomena linked to such discrepancies. In many social situations including those examples, discrepancies in expressed and inner opinions of individuals can emerge, a variety of consequential phenomena. Actually, such discrepancies and inter-personal disputes are often vigorous on small differences of opinion than large differences [3]. Opinion dynamics is a field that has been studied for a long time with application to social consensus building and election [4, 5]. The transition of social debate leading to consensus building is a long-standing problem. Moreover, in modern society, it goes that this is also an important theme in the analysis of various communications on the Internet. So-called flames on the Internet may be one of the themes that can be handled by opinion dynamics. Opinion dynamics of binary opinions of yes and no have been studied for a long time in the analogy of magnetic physics [6–12]. In addition, since 2000, a Bounded Confidence Model has been presented that analyzes opinions as continuous, rather than binary, quantities. Precise research has also been carried out by this theory [13–17]. N. Okano · Y. Ohira · A. Ishii (B) Tottori University, Koyama, Tottori 680-8552, Japan e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_39
461
462
N. Okano et al.
However, the conventional Bounded Confidence Model implicitly assumed that a social agreement was finally reached. In the work of Hegselmann-Krause [14] as the typical Bounded Confidence Model, the following equation is the equation of opinion dynamics: Ii (t) =
Di j I j
(1)
j
here Ii (t) is the opinion of the agent “i”. The coefficient Di j is fixed to be a positive value. Thus, Eq. (1) shows consensus building in society. Namely, it is theoretically implicit that the opinions of all members converge if Di j are limited to positive values. In other words, it is not the result of individual simulations that the opinions of society converge, but the Hegselmann–Krause Theory [14] contains the convergence of the opinions of society. In fact, not all opinions of society are consensuses. It is rather rare that there is consensus on opinions for politics and economics. Ishii et al. extended the Bounded Confidence Model by introducing opposition and distrust [18–23]. The extension is simply that the coefficient Di j is not limited to a positive value, but a negative value is introduced. Furthermore, there is an opinion that can be said to be a secret behind the scenes of social movements, apart from a public opinion that is open to the outside. This is regarded as an official stance and real opinion, and the two components of the official stance and the real opinion are considered. Ishii-Okano’s theory [24] was constructed based on the idea of the opinion dynamics. In this paper, Ishii-Okano’s theory is applied to various concrete examples in a two-agent system.
2 Theory The equation of the new opinion dynamics proposed by Ishii [16] is the following: mIi (t) = ci A(t)t +
N
Di j f Ii , I j I j − Ii t
(2)
j=1
where f Ii , I j =
1 1 + exp a Ii − I j − b
(3)
is the Sigmoid function as a cut-off function. For the coefficient of trust, Dij and Dji are assumed to be independent. We assume that Dij is asymmetric matrix and Dij/= Dji . Moreover, Dij and Dji can have both positive and negative values. The positive value means i trust j. The negative value means i distrust j. Furthermore, m is the strength
Two-Component Opinion Dynamics Theory …
463
Fig. 1 Calculation of the opinion dynamics for 2 agents where the left is DAB > 0 and DBA > 0 and the right is DAB < 0 and DBA < 0
of will. For a large value of m, the agent does not be affected by mass media or opinions of other agents. Using this theory of opinion dynamics, calculations are performed for those who are charismatically popular in society [23] and those who are hated by society as a whole [21], and also when the society is divided. Therefore, there is a possibility in this theory of opinion dynamics that there is a possibility of performing social simulation calculations corresponding to many social movements. Here is a simple calculation using this new opinion dynamics theory. Figure 1 shows opinion dynamics for two people, and Fig. 1 left shows the case where two people trust each other. Is the case where two people have distrust in each other. If they trust each other, they appear in Hegselmann-Krause, but if they have distrust, they cannot calculate without this theory. Ishii-Okano [24] extended this new theory of opinion dynamics to the theory of independent two-component opinion dynamics. In the paper of Ishii-Okano [24], a general theory of multi-component opinion dynamics was presented. Concrete calculations are presented as two components. Considering the interpretation of the results calculated by theory, the two components of this opinion are preferably two independent components. For example, the opinion axis on politics and the opinion axis on the economy. It cannot be independent of the content. The opinion axis on popular dramas and the opinion axis on the entertainment world, in general, will not be independent. Therefore, it is quite difficult to set two concrete opinion axes, and they need to be carefully considered. In this study, the opinion axis of these two components was set as “real opinion” and “official stance”. As discussed in the paper of Ishii-Okano [24], in Shakespeare’s famous play “Romeo and Juliet” [25], the real opinion of Romeo and Juliet falling in love and lives honestly in the affection of the two. However, the Montague family to which Romeo belongs and the Capulet family to which Juliet belongs are in conflict for generations. In an official stance, Romeo of the Montague family and Juliet of the Capulet family cannot fall in love. Such a problem is a typical problem in which real opinion and official stance conflict. It is safe to consider the real opinion and official stance as two independent axes of opinion.
464
N. Okano et al.
First, we extend the opinion into a two-component opinion as follows. Θi (t) = Ii(1) (t), Ii(2) (r )
(4)
We define the coefficient of trust/distrust as the following matrix. Ωi j =
(12) Di(1) j Ei j (21) E i j Di(2) j
(5)
Using the above, we extend the equation of the opinion dynamics into twocomponent as follows. ⎡ Θi (t + t) − Θi (t) = ⎣χi A(t) +
⎤ Ωi j Θ j (t)⎦t,
(6)
j
here, the mass media effect is also extended as a two-component version. χi = Ci(1) , Ci(2) Namely, ⎡ dΘi (t) = ⎣χi A(t) +
⎤ Ωi j Θ j (t)⎦dt
(7)
j
The equations for each component is as follows. ⎡ d Ii(1) (t) = ⎣Ci(1) A(t) +
(1) Di(1) j I j (t) +
j
⎡ d Ii(2) (t)
=
⎣Ci(2) A(t)
+
j
⎤ (2) ⎦ E i(12) j I j (t) dt
(8)
j
(1) E i(21) j I j (t)
+
⎤ (2) ⎦ Di(2) j I j (t) dt
(9)
j
Adding the Sigmoid type cut-off function and the difference of the opinions like eq. (1), we obtain the following equations: ⎡ ⎢ ⎢ d Ii(1) (t) = ⎢ ⎣
⎤ (1) (1) (1) (1) I Di(1) f I , I − I (t) j i j j i ⎥ j ⎥ ⎥dt (12) (2) (2) (2) (2) ⎦ + I j (t) − Ii E i j f Ii , I j
Ci(1) A(t) +
j
(10)
Two-Component Opinion Dynamics Theory …
465
⎤ (21) (1) (1) (1) (2) (1) C I A(t) + E f I , I − I (t) ij i j j i ⎥ ⎢ i ⎥ ⎢ j (2) ⎥dt ⎢ d Ii (t) = ⎢ ⎥ (2) (2) (2) (2) (2) ⎦ ⎣+ I j (t) − Ii Di j f Ii , I j ⎡
(11)
j
The above equation is the final form of the two-component opinion dynamics [24]. However, here, we add one more additional term which is the interaction between the official stance and the real opinion of the agent itself. In other words, if the official stance and real opinion are far apart, they will feel stress, so we think they should try to get as close as possible. Including such self-interaction, we obtain the equation of the two-component opinion dynamics in the following form: ⎤ (1) (2) (1) (1) (1) (1) (1) + I j (t) − Ii A(t) + Dii Ii − Ii Di j f Ii , I j ⎥ ⎥ j ⎥dt ⎥ (12) (2) (2) (2) (2) ⎦ I (t) − I E f I ,I +
⎡ ⎢ ⎢ (1) d Ii (t) = ⎢ ⎢ ⎣
(1)
Ci
ij
i
j
j
i
j
⎡ (2)
d Ii
(12)
⎤ (21) (1) (1) (1) (1) I j (t) − Ii E i j f Ii , I j ⎥ ⎥ j ⎥dt ⎥ (2) (2) (2) (2) (2) ⎦ I j (t) − Ii Di j f Ii , I j +
(2) (2) (2) (1) C A(t) + Dii Ii − Ii + ⎢ i
⎢ (t) = ⎢ ⎢ ⎣
j
(13)
3 Results 3.1 Setting of Calculation The calculation sets up two agents, A and B. These A and B may be individuals, companies, or organizations, nations, and governments. We calculate the changes in opinions of A and B using the above equation. In the calculation, as shown in Fig. 2, three graphs are shown for the official stance of A and B, real opinion of A and B, and both the A and B official stance and real opinion. In Fig. 2, A and B trust each other in the real opinion, and the consensus is formed. However, in the official opinion, their opinions are greatly separated, and the distance between their opinions is far apart due to distrust. It is the same as Romeo and Juliet mentioned earlier.
466
N. Okano et al.
Fig. 2 The official stance of A and B, real opinion of A and B, and both the A and B official stance (1) (1) (2) (2) (1) (1) (2) (2) (12) and real opinion. DAB = 0 DBA = 0 DAB = 1 DBA = 1 DAA = 0 DBB = 0 DAA = 0 DBB = 0 EAB (12) (21) (21) = 0 EBA = 0 EAB = 0 EBA = 0
Fig. 3 The three graphs of official stance, real opinion and the both of the agent A and B. (1) (1) (2) (2) (1) (1) (2) (2) DAB = −0.5 DBA = −0.5 DAB = 1 DBA = 1 DAA = 0.5 DBB = 0.5 DAA = 0.5 DBB = 0.5 (12) (12) (21) (21) EAB = 0 EBA = 0 EAB = 0 EBA = 0
3.2 Self-interaction of Official Stance and Real Opinion Next, the interaction between the agent’s own official stance and real opinion is included. Figure 3 shows that the influence of the person’s real opinion and official stance is a positive value, in addition to Fig. 2. If a positive value is given, the person’s real opinion and the official stance will approach each other. Therefore, looking at the figure, the official stance is closer to the real opinion due to its influence, and the gap between A and B’s opinions is somewhat smaller. Conversely, real opinion is a relationship that tries to approach each official stance, and a consensus was formed in Fig. 2, but the consensus was not reached. But they can’t agree.
3.3 Asymmetric Case Next, consider the case where the effects of A and B have asymmetry. In Fig. 4, the real opinion of A is positively influenced by the official stance of B. The official stance of A has a negative effect on the real opinion of B. As a result, the real opinion of A attempts to approach the official stance of B, and the real opinion of B attempts to approach the official stance of A. The opinion changes as shown in Fig. 4.
Two-Component Opinion Dynamics Theory …
467
Fig. 4 The three graphs of official stance, real opinion and the both of the agent A and B. (1) (1) (2) (2) (1) (1) (2) (2) (12) DAB = −0.5 DBA = −0.5 DAB = 1 DBA = 1 DAA = 0 DBB = 0 DAA = 0 DBB = 0 EAB = (12) (21) (21) 0.5 EBA = 0.5 EAB = 0.5 EBA = −0.5
3.4 Romeo and Juliet As shown in the paper of Ishii-Okano [24], Shakespeare’s “Romeo and Juliet” is useful as an example of calculating these two-component opinion dynamics. In stance, opinions differ greatly due to the confrontation between the houses. This is the content of the original Romeo and Juliet. In the calculation, when the influence of the official stance of the opponent is added to the real opinion, it takes into account the official stance. Consensus is not formed even in real opinion. In other words, the emphasis is placed on the confrontation between homes, and romance is stopped. On the other hand, if the real opinion of the opponent enters the official stance, they will be attracted by the real opinion. The disagreement between the houses as stances is slightly neglected, and the distance between Romeo and Juliet’s official stance is reduced accordingly. As an application of the partial opinion dynamics theory, a likely example understands Romeo and Juliet.
4 Discussion The idea of differentiating between real opinion and official stance is found in Asch [26], and more recently, many studies have discussed the difference between real opinion and official stance in opinion dynamics [27–29]. However, these studies are based on Hegselmann-Krause [14] and are based on consensus building. Therefore, the focus of discussion is that real opinion and official stance differ slightly in the time to reach consensus. In other words, as in Romeo and Juliet mentioned in this paper, the real opinion and official stance are completely different, and the case where either has no hope of reaching consensus cannot be handled. In this regard, the theory of two-component opinion dynamics proposed in this study can handle cases such as Romeo and Juliet, who would like to agree with a real opinion but cannot reconcile with the official stance. I would like to emphasize that even if they are working toward an agreement, they can also handle the case of national confrontation shown in the paper of Ishii-Okano [24].
468
N. Okano et al.
The separation of real opinion and official stance can be said to be a feature of Japanese society, but to a certain extent, there will be also in societies outside Japan. It can be said that official stance and real opinion always intersect in diplomatic issues between nations. It is not unusual that housewives talk on the roadside and the real opinion spoken in the home is very different from each other, even if the neighborhood of the house is close to each other. In the case of gender relations, contrary to Romeo and Juliet, real opinion says that even if their hearts are away from each other, they will continue to be lovers as official stances. In this way, there are many cases where there is a real opinion apart from the official stance that comes out of the world. Even in the issue of consensus building, if the opinion of real opinion is important separately from official stance, consensus will not be reached if overlooked. If civil engineering requires the construction of a dam, the community’s consensus is required. People’s official stance opinion alone is not enough to reach an agreement, where people have real opinions, just against dam construction, intentions to protect the landscape, or the amount of security deposits, It is important how the real opinion relates to the opinion of the official stance. Also, it is important to determine whether the flame that occurs on the Internet is the flame affected by the real opinion or the flame of the official stance alone. Generally, only the official stance can be observed on the Internet, and the real opinion cannot be observed. Considering this, the effect of the real opinion on the official stance, or the official stance on the real opinion. No less influence to obtain, the theory can analyze and simulate it seems that there are many applications range. The method of using the theory of opinion dynamics proposed in this paper is not to compare the observed values of official stance and real opinion with the calculated results, but to observe the observed values of official stance. It would be a good usage to apply this theory in such a direction as to infer the real opinion of the unobservable partner from the theoretical calculation with the observed value of official stance and the well-known real opinion like the real opinion of their own real opinion.
5 Conclusion The theory of opinion dynamics by Ishii, who introduced trust and distrust in human relations in society, was extended to multiple components. The theory was made up of N components. In addition to the case where the opponent’s real opinion affects his/ her own official opinion and the opponent’s official stance influences his/ her own real opinion, the calculation is made as follows. We have proposed opinion dynamics that adds a new element of real opinion to observational official stance opinion in addition to observable official stance opinion. Acknowledgements This work was supported by JSPS KAKENHI Grant Number JP19K04881. The authors thank for the meaningful discussion on the two-dimensional opinion dynamics of official stance and real opinion with Prof. S Galam of CEVIPOF-Centre for Political Research, Sciences Po and CNRS, Paris, France, Prof. H.Yamamoto of Rissho University and Prof. H. Takikawa of Tohoku University.
Two-Component Opinion Dynamics Theory …
469
References 1. Kuran, T.: Sparks and prairieres: a theory of unanticipated political revolution. Public Choice, vol. 61(1), pp. 41–74 (1989) 2. Ye, M., Qin, Y., Govaert, A., Anderson, B.D.O., Cao, M.: An inuence network model to study discrepancies in expressed and private opinions. Automatica 107, 371–381 (2019) 3. Friedkin, N.E.: Scale-free interpersonal inuences on opinions in complex systems. J. Math. Sociol. 39(3), 168–173 (2015) 4. Castellano, C., Fortunato, S., Loreto, V.: Statistical physics of social dynamics. Rev. Mod. Phys. 81, 591–646 (2009) 5. Sîrbu, A., Loreto, V., Servedio, V.D.P., Tria, F.: Opinion dynamics: models, extensions and external effects. In: Loreto, V., et al. (eds.) Participatory Sensing, Opinions and Collective Awareness. Understanding Complex Systems, pp. 363–401. Springer, Cham (2017) 6. Galam, S.: Rational group decision making: a random field Ising model at T = 0. Phys. A 238, 66 (1997) 7. Sznajd-Weron and J. Sznajd: Int. J. Mod. Phys. C 11, 1157 (2000) 8. Sznajd-Weron, M.: Tabiszewski, and A. M. Timpanaro: Europhys. Lett. 96, 48002 (2011) 9. Galam, S.: Application of statistical physics to politics. Physica A 274, 132–139 (1999) 10. Galam, S.: Real space renormalization group and totalitarian paradox of majority rule voting. Physica A 285, 66–76 (2000) 11. Galam S: Are referendums a mechanism to turn our prejudices into rational choices? An unfortunate answer from sociophysics. Laurence Morel and Matt Qvortrup (eds.), Chapter 19 of The Routledge Handbook to Referendums and Direct Democracy Taylor & Francis, London (2017) 12. Galam, S.: The Trump phenomenon: an explanation from sociophysics. Int. J. Mod. Phys. B 31, 1742015 (2017) 13. Weisbuch, G., Deffuant, G., Amblard, F., Nadal, J.-P.: Meet, discuss and segregate! Complexity 7, 55–63 (2002) 14. Hegselmann, R, Krause, U.: Opinion dynamics and bounded confidence models, analysis, and simulation. J. Artif. Soc. Soc. Simul. 5 (2002) 15. Jager, W., Amblard, F.: Uniformity, bipolarization and pluriformity captured as generic stylized behavior with an agent-based simulation model of attitude change. Computat. Math. Organ. Theory 10, 295–303 (2004) 16. Jager, W., Amblard, F.: Multiple attitude dynamics in large populations. In: Presented in the Agent 2005 Conference on Generative Social Processes, Models and Mechanisms, 13–15 October 2005 at The University of Chicago (2005) 17. Kurmyshev, E., Juárez, H.A., González-Silva, R.A.: Dynamics of bounded confidence opinion in heterogeneous social networks: concord against partila antagonism. Phys. A 390, 2945–2955 (2011) 18. Ishii, A., Kawahata, Y.: Opinion dynamics theory for analysis of consensus formation and division of opinion on the internet. In: Proceedings of the 22nd Asia Pacific Symposium on Intelligent and Evolutionary Systems (IES2018), pp. 71–76 (2018). arXiv:1812.11845 [physics.soc-ph] 19. Ishii, A.: Opinion dynamics theory considering trust and suspicion in human relations. In: Morais, D., Carreras A., de Almeida A., Vetschera R. (eds.) Group Decision and Negotiation: Behavior, Models, and Support. GDN 2019. Lecture Notes in Business Information Processing, vol. 351, pp. 193–204. Springer, Cham (2019) 20. Ishii, A., Kawahata, Y. (2009) Opinion dynamics theory considering interpersonal relationship of trust and distrust and media effects. In: The 33rd Annual Conference of the Japanese Society for Artificial Intelligence, vol. 33 (JSAI2019 2F3-OS-5a-05) (2019) 21. Okano, N., Ishii, A.: Isolated, untrusted people in society and charismatic person using opinion dynamics. In: Proceedings of ABCSS2019 in Web Intelligence, pp. 1–6 (2019) 22. Ishii, A., Kawahata, Y: New Opinion dynamics theory considering interpersonal relationship of both trust and distrust. In: Proceedings of ABCSS2019 in Web Intelligence, pp. 43–50 (2019)
470
N. Okano et al.
23. Okano, N., Ishii, A.: Sociophysics approach of simulation of charismatic person and distrusted people in society using opinion dynamics. In: Sato, H., Iwanaga, S., Ishii, A. (eds.) Proceedings of the 23rd Asia-Pacific Symposium on Intelligent and Evolutionary Systems, pp. 238–252. Springer (2019) 24. Ishii, A., Okano, N.: Two-dimensional opinion dynamics of real opinion and official stance. In: Proceedings of NetSci-X 2020: Sixth International Winter School and Conference on Network Science, Springer Proceedings in Complexity, pp. 139–153 (2020) 25. Levenson, J.L. (ed.): Romeo and Juliet. The Oxford Shakespeare. Oxford University Press, Oxford (2000). ISBN: 0-19-281496-6 26. Asch, S.E.: Effects of group pressure upon the modification and distortion of judgments. In: Groups, Leadership and Men; Research in Human Relations, pp. 177–190. Carnegie Press, Oxford (1951) 27. Wang, S.W., Huang, C.Y., Sun, C.T.: Modeling self-perception agents in an opinion dynamics propagation society. Simulation 90, 238–248 (2014) 28. Huang, C.-Y., Wen, T.-H.: A novel private attitude and public opinion dynamics model for simulating pluralistic ignorance and minority influence. J. Artif. Soc. Soc. Simul. 17, 8 (2014) 29. León-Medina, F.J., Tena-Sànchez, J.M., Miguel, F.J.: Fakers becoming believers: how opinion dynamics are shaped by preference falsification, impression management and coherence heuristics. Qual. Quant. (2019). https://doi.org/10.1007/s11135-019-00909-2
Theory of Opinion Distribution in Human Relations Where Trust and Distrust Mixed Akira Ishii and Yasuko Kawahata
Abstract In this paper, we simulated based on the theory of the dynamics of a recent new opinion that incorporates both trust and distrust in human relationships. As a result, it was observed that the aspect of consensus building depends on the ratio between the confidence coefficient and the distrust coefficient. For the model proposed in this study, the ratio of confidence factor to mistrust tended to change, like a phase transition near 55%. This implies that 55% of the connections between people are sufficient for a Esociety to reach consensus.
1 Introduction With the spread of public networks, public network devices are spreading all over the world. As a result, the opportunities for decision-making and consensus building beyond space–time constraints are greatly increasing. In other words, online plays a big role as a concrete physical place for disagreement and consensus building. In the future, it is expected that simulation discussions on the formation of opinions of many people will be required. The issue of consensus building through social exchange has been studied for a long time. However, in many cases, theories are mainly analytical approaches based on consensus building [1, 2]. As consensus was reached, people’s opinions tended to eventually converge on a small number, as the trusted model [3–5] shows. However, the distribution of opinions in real society cannot be summarized into a small number of opinions. At least, it is assumed, depending on the results of A. Ishii (B) Department of Applied Physics, Tottori University, Koyama, Tottori 680-8552, Japan e-mail: [email protected] Tottori University Cross-Informatics Research Center, Koyama, Tottori 680-8552, Japan Center for Computational Social Science, Kobe University, Kobe 657-8501, Japan Y. Kawahata Faculty of Social and Information Studies, Gunma University, 4-2 Aramaki-Machi, Maebashi, Gunma 371-0044, Japan © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_40
471
472
A. Ishii and Y. Kawahata
the simulation, that they cannot be summed up. Recently, Ishii and Kawahata have proposed a new theory of opinion dynamics that includes both a relationship between trust and distrust in human networks in society [6–8]. In this paper, we apply theory to the distribution of opinions in a society, where both trust and distrust are included. This paper discusses, the tendency of phase transition like behavior in depend on the ratio of trust and distrust of human relations in society.
2 Method We apply the opinion dynamics theory to calculate simulation for human behavior in society [6–8]. In this paper, in order to discuss the time transition and trust relationship between the two parties, we introduce distrust into the Bounded Confidence Model. For a fixed agent, say i, where 1 ≤ i ≤ N, we denote the agent’s opinion at time t by I i (t). We modify the meaning of the coefficient Dij in the bounded confidence model as the coefficient of trust. We assumed here that Dij > 0 if there is a trust relationship between the two persons, and Dij < 0 if there is distrust relationship between the two persons. In the calculation in this paper, we assume that Dij is constant [6]. Therefore, the change in opinion of the agent i can be expressed as follows [7]. I i (t) = ci A(t)t +
N
D i j f I i , I j I j − I i t
(1)
j =1
where we use the sigmoid type smooth cut-off function below in order to cut-off effects from people who have very different opinions. Namely, in this model, we hypothesize that people do not pay attention to opinions that are far from their own. f Ii, I j =
1 1 + exp a I i − I j − b
(2)
We assume here that Dij and Dji are independent. Usually, Dij is an asymmetric matrix; Di j = Di j . In the calculation below, we set a = 1 and b = 5. Furthermore, a condition that D i j and D j i can have different signs. In the following actual calculations, we show simulation calculations using the Ishii theory [7] for 300 persons. For simplicity, we assume that the connection of 300 persons form complete graph. Dij for each link of persons are set to be the value between −1 and 1 using random number in actual calculations. In the actual calculation, the setting of the ratio of positive value and negative value is changed when Dij takes a value of 0–1 and the proportion of taking a value of −1 to 0. The initial opinion values for 300 persons are decided between −20 and +20 randomly. The locus of 300 opinions is calculated as a result of the opinion dynamics model calculation, and the histogram is simulated as an opinion distribution. Additionally, for comparison of calculated distribution with observation, we use comments to TV
Theory of Opinion Distribution in Human Relations …
473
news reports on YouTube. Currently, there are also various studies related to natural language processing owing to (i) new full provision of data based on dictionary data and learning data and (ii) expansion of classification methods in machine learning [9–11]. The data set on YouTube in this analysis method is focused on the English language. A comparison is expected regarding the differences between media. In short, as it is now possible to judge the strength or weakness of opinion based on text collected in natural speech processing, it is also possible now to measure the distribution of opinion within society not in binary terms but as a continuous distribution from positive to negative and neutral comments.
3 Results In this section, as a simulation example, we show a simple calculation using 300– 1000 persons as new opinion dynamics theory. We assume that the connections form a complete graph. Figure 1 shows the case of a trust relationship of 300–1000 people. This result implies consensus building. The trajectory of this view is presumed to be similar in principle to the simulation results of the original bounded trust model [3–5] on the left side of the Figure. In Figs. 2 and 3, we show the result of simulation where positive Dij and negative Dij are mixed in half and half. In this case, we found that there is no consensus building. As we can see from Figs. 2 and 3, the calculated results are very different from Fig. 1 or the results of the Bounded Confidence Model [3–5]. The case that negative Dij or distrust relation is included in the society, the consensus building seems to be impossible. In Fig. 4, we show the opinion distribution of comments for TV news reports (YouTube ver.) as an example of observed opinion distribution. As we can see in Fig. 3, observed opinion distribution looks similar to Fig. 2, at least qualitatively. If the observed opinion distribution was consensus-building feature, the observed
Fig. 1 Calculation result for N = 300. The human network is assumed to be a complete graph. The left is the trajectories of opinions. The right is the distribution of opinions at the final time of this calculation. Dij are set to be 0–1 randomly so that every person trust for everyone
474
A. Ishii and Y. Kawahata
Fig. 2 Calculation result for N = 300. The human network is assumed to be a complete graph. The left is the trajectories of opinions. The right is the distribution of opinions at the final time of this calculation. Dij are set to be −1 to 1 randomly so that every person trust or distrust for everyone
Fig. 3 Calculation result for N = 1000. The human network is assumed to be a complete graph. The left is the trajectories of opinions. The right is the distribution of opinions at the final time of this calculation. Dij are set to be −1 to 1 randomly so that every person trust or distrust for everyone
distribution would consist of a few sharp peaks like Fig. 1. The above calculation experiment is taken into consideration from the fact that the number of plays of major news media such as CNN is high. In the process, we find that the opinion distribution observed at least qualitatively in the comments to YouTube’s TV news report using natural language processing has a similar tendency to the opinion distribution calculated from the new opinion dynamics theory. At least, we found continuous opinion distribution like Figs. 2 or 3. The observed opinion distributions on comments to YouTube’s TV are similar to Fig. 4 and we do not find the opinion distribution like Fig. 1 in a lot of actual observation of comments to YouTube’s TV news reports. In the future, we think it is important to look at trends in social physics methods such as calculated results and opinion dynamics in machine learning. As we can see from the above, we obtain consensus building for 100% positive trust and we obtain no consensus building for 50% positive trust. We should determine what is the origin of the difference behavior on consensus building we showed above. In Fig. 5, we show the opinion trajectories of 300 persons for a different ratio of positive vs negative Dij . The positive ratio of Dij is 0.60, we found that the trajectory looks consensus building. The positive ratio of Dij is 0.51, the opinion trajectory of
Theory of Opinion Distribution in Human Relations …
475
Fig. 4 Distribution of negative, positive, and neutral comments in Case 200-pound ripped kangaroo crushes metal) (acquisition period 2015/6/5 to 2019/5/28; score range in each case: 0–1)
Fig. 5 The opinion trajectories of 300 persons for the positive coefficient ratio of Dij between 0.51 and 0.6
300 persons seems to be non-consensus building. In Fig. 6, we show the opinion distributions for positive Dij ratio of 0.51–0.60. We found that the results seem to be consensus building like the Bounded Confidence Model for 0.56–0.60. Below 0.54, the calculated distribution looks non-consensus building. In Fig. 7, we show the maximum value of the calculated distribution as a function of the positive coefficient ratio of Dij . This plotting is similar to the plotting in ref.12. Below 0.53, the maximum values of the opinion distributions are nearly 20. However, the maximum values of the opinion distributions are over 120 for the positive coefficient values above 0.55. This means that the aspect of the distribution
476
A. Ishii and Y. Kawahata
Fig. 6 The opinion distributions of 300 persons for the positive coefficient ratio of Dij between 0.51 and 0.6
Fig. 7 The maximum distribution value as a function of the positive coefficient ratio of Dij
is different at 0.55 or less, as in the phase transition. It is the phase transition between consensus building and the situation where consensus is not built where the order parameter for this phase transition is the ratio of positive Dij . As we set the network of the people as a complete graph, the phase transition behavior can be changed for different network structures. We have confirmed that we can observe such phase transition like behavior for random networks where the ratio of the link is not low, but at least at the moment we can not find such phase transition like behavior for scale-free networks. In our present calculation, we assume that people neglect the opinion of other persona if his opinion is apart more than 5. The change of this value changes the number of peaks of the opinion distribution of consensus building as the previous Bounded Confidence Model [3–5] showed. Qualitatively, the phase transition behavior is insensitive to the value.
Theory of Opinion Distribution in Human Relations …
477
4 Discussion In this research, we examined how the aspect of opinion agreement changes depending on whether the credibility factor connecting people is positive or negative [6–8]. Using a new theory of opinion dynamics, it is clear from Fig. 1 that the limit in the case of all positive confidence coefficients connecting people is the Bounded Confidence Model [3–5]. In other words, consensus building is obtained when all people are linked by trust. On the other hand, we showed that the coefficient of trust connecting people does not reach consensus at all when the ratio of positive and negative is half. Furthermore, it was found that changing the ratio of positive value and negative value of the credibility factor between people changes the situation of consensus formation and non-consensus formation continuously. It is found that the state of consensus formation and the state where agreement is not formed changed rapidly like a phase transition as the ratio of the trust relationship and distrust relationship is changed around 55%. In this study, the connection of 300 people is a complete graph. If the network structure is different like scale-free network, the phase transition boundary may change from 55%. On the other hand, however, considering the necessary situation of practical consensus building, it is not necessary to have 100% trust of people to reach a consensus building. It is important that only 55% trust relationship is enough to get a high probability of progressing to consensus building.
5 Conclusion In this study, using the theory of opinion dynamics that incorporates both trust and distrust into human relationships, we have shown that the aspect of consensusbuilding changes with the ratio of the trust and distrust coefficients. It is found that the ratio of the coefficient of confidence to distrust changes in phase transition per 55%. This means that it is sufficient for 55% of the connections between people to be trusting in order for society to reach consensus.
6 Future Works Since the advent of the Web, both positive and negative aspects have gradually emerged. Nowadays, potential risks online and social issues in fake news are advocated worldwide [12]. For example, it is possible to consider the case of contributing to the recommendation of the public opinion formation of the society and the news and opinion by a certain pattern learned by natural language machine learning. In addition, it is thought that the collision between artificially generated opinion and
478
A. Ishii and Y. Kawahata
spontaneous opinion becomes even more intense. From such point of view, there are the result of the fact check and the calculation result in the model by scoring the speech under the uniform condition. We think that the study that will be reconsidered as a kind of social phenomenon will be further promoted. It is desirable to improve the understanding of media literacy regarding the transmission and reception of information online, because it is on an OSN for which international law has not been established. Acknowledgements This work was supported by JSPS KAKENHI Grant Number JP19K04881 and “R1 Leading Initiative for Excellent Young Researchers(LEADER)” based on Japan Society for the Promotion of Science(JSPS).
References 1. Castellano, C., Fortunato, S., Loreto, V.: Statistical physics of social dynamics. Rev. Mod. Phys. 81, 591–646 (2009) 2. Sîrbu, A., Loreto, V., Servedio, V.D.P., Tria, F.: Opinion dynamics: models, extensions and external effects. In: Loreto, V., et al. (eds.) Participatory Sensing, Opinions and Collective Awareness. Understanding Complex Systems. Springer, Cham (2017) 3. Deffuant, G., Neau, D., Amblard, F., Weisbuch, G.: Mixing beliefs among interacting agents. Adv. Complex Syst. 3, 87–98 (2000) 4. Weisbuch, G., Deffuant, G., Amblard, F., Nadal, J.P.: Meet, discuss and Segéregate! Complexity 7(3), 55–63 (2002) 5. Hegselmann, R., Krause, U.: Opinion dynamics and bounded confidence models, analysis, and simulation. J. Artif. Soc. Soc. Simul. 5, 1–33 (2002) 6. Ishii, A., Kawahata, Y.: Opinion dynamics theory for analysis of consensus formation and division of opinion on the internet. In: Proceedings of The 22nd Asia Pacific Symposium on Intelligent and Evolutionary Systems, pp. 71–76 (2018). arXiv:1812.11845 [physics.soc-ph] 7. Ishii, A.: Opinion dynamics theory considering trust and suspicion in human relations. In: Morais D., Carreras A., de Almeida A., Vetschera R. (eds.) Group Decision and Negotiation: Behavior, Models, and Support. GDN 2019. Lecture Notes in Business Information Processing, vol. 351, pp. 193–204. Springer, Cham (2019) 8. Ishii, A., Kawahata, Y.: Opinion dynamics theory considering interpersonal relationship of trust and distrust and media effects. In: The 33rd Annual Conference of the Japanese Society for Artificial Intelligence, vol 33. JSAI2019 2F3-OS-5a-05 (2019) 9. Agarwal, A, Xie, B., Vovsha, I., Rambow, O. and Passonneau, R.: Sentiment analysis of twitter data. In: Proceedings of the Workshop on Languages in Social Media, pp. 30–38. Association for Computational Linguistics (2011) 10. Siersdorfer, S., Chelaru, S. and Nejdl, W.: How useful are your comments?: Analyzing and predicting youtube comments and comment ratings. In: Proceedings of the 19th International Conference on World Wide Web, pp. 891–900 (2010) 11. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354 (2005) 12. Sasahara, H., Chen, W., H Peng, H., Ciampaglia, G. L., Flammini, A., Menczer, F.: On the Inevitability of Online Echo Chambers (2020). arXiv:1905.03919v2
Is the Statistical Property of the Arrowhead Price Fluctuation Time Dependent? Mieko Tanaka-Yamawaki and Masanori Yamanaka
Abstract Statistical property of price fluctuation plays an important role in the decision-making of financial investments. The traditional financial engineering uses the random walk hypothesis to determine derivative prices. However, it is well-known that the Black–Sholes–Merton formula to compute option prices tends to fail and one of the reasons for this failure has been attributed to the random walk hypothesis. Namely, the statistical distribution of actual price fluctuation does not necessarily follow the standard normal distribution but has a higher probability towards both tales than that of Gaussian. In other words, actual market prices are much riskier than the case predicted by the standard Gaussian random walk. Motivated by this fact, we investigated a large amount of data recently produced by the ultra-fast transaction of the Tokyo Stock Exchange (TSE) market at the level of sub-millisecond speed of transaction called as ‘Arrowhead’ stock market. After substantial numerical and statistical analysis of recent stock prices as well as index prices in TSE, we have reached a conclusion at a certain level of accuracy. Namely, non-Gaussian stable distribution corresponding to α < 2 is observed in various index data, while Gaussian distribution corresponding to α = 2 seems to be observed in the case of single stock price fluctuation at the limit of the sub-second time range. Those facts imply that the Gaussian assumption (α = 2) underneath the Black–Sholes formula cannot be used for the derivatives of index prices.
1 Introduction Although the direction of price change is hard to predict directly, financial technology nowadays can tell us the level of risk inherent in the market so that we can prepare various kinds of derivatives to reduce the risk in investments. The basic technique M. Tanaka-Yamawaki (B) Meiji University, 21-1-4 Nakano, Nakano-Ku, Tokyo 164-8525, Japan e-mail: [email protected] M. Yamanaka Nihon University, 14-8-1, Kanda-Surugadai, Chiyoda-Ku, Tokyo 101-8308, Japan © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_41
479
480
M. Tanaka-Yamawaki and M. Yamanaka
underneath the financial technology starts from the fundamental assumption of the random walk hypothesis of price changes [1]. However, the deviation from this hypothesis has been argued by many investigators, e.g., [2], based on widely observed phenomena typically called as ‘fat-tail’ and ‘narrow-neck’ distributions. Theoretically, Levy’s stable distribution is a good candidate for such empirical distributions, except for one fatal problem that the variance such stable distribution is divergent. It was also argued that the difference between Levy’s stable distribution and the log-normal distribution is rather small and almost indistinguishable [3]. The statistical nature of price fluctuation by itself is a fundamental question in pure science, but it also has important information directly related to the design of a financial product. We have recently been working to challenge this problem by directly analyzing the full transaction data provided by the market. The main focus of our attention is whether or not the same kind of theoretical assumptions hold in the current market that allows ultra-fast transaction of sub-millisecond speed. In Japan, Tokyo Stock Exchange, Inc. (TSE) started a new trading system that allows transactions in a millisecond from the beginning of January 2010 and upgraded to 0.5 ms in September 2015. Meanwhile, TSE merged with Osaka market (OSE) to be a single market (Japan Exchange, JPX) and began to offer dataset via the Internet. [4]. By using those data, we explore the world of today’s financial markets to study the statistical nature of market prices in comparison to the past result of the traditional markets. We have so far obtained two remarkable results. The first one is that the statistical distribution of a single stock price (code 8306) at a few second interval follows the stable distribution of parameter α = 2.0, that supports the Gaussian hypothesis underneath the Black–Sholes formula [6, 7]. The second result is that the statistical distribution of stock index prices, or average price time series, follows the stable distribution of parameter α < 2.0, which does not support the Gaussian hypothesis [7]. The latter result strongly shows the deviation from the Gaussian hypothesis that have been argued for a long time [2]. Here the numerical value of α varies in the range of 1.4 < α < 1.9, but definitely smaller than 2.0. [5–7]. By combining those two facts, we can draw the following scenario on the price fluctuation at a very short time interval in the ‘Arrowhead’ stock market. Namely, the fundamental process of price fluctuation is the Gaussian random walk (α = 2.0), while the price movements of a compound process, such as an average price including index prices, or a price change at a long time interval, are not Gaussian (α < 2.0), reflecting their mutual interaction.
Is the Statistical Property of the Arrowhead …
481
2 Statistical Distribution of Price Fluctuation: Gaussian or not? The most remarkable outcomes of financial technology would be the derivation of the Black–Sholes formula (BSM formula) [8] for the European style option pricings in which the price of the call option C, for example, can be calculated with high accuracy under the assumption that the probability distribution of the asset (e.g., stock) returns is Gaussian, and the volatility (σ ) is known, as follows. C(St , t) = N (d1 )St − N (d2 )Ke−r (T−t)
(1)
where St is the asset price at time t, T is the prescribed time of the option, and K is the target price of that asset at time T, and x
N (x) = ∫ e −∞
−z 2 2
dz
√ σ2 St + r+ d1 = log (T − t) /σ T − t K 2 √ d2 = d1 − σ T − t
(2)
(3) (4)
If the statistics of the price fluctuation is not Gaussian, the above formula has to be changed. Note, however, that the incomplete Gaussian integral in Eq. (2) is computed by numerical integral by truncating the tail part. Thus, it is irrelevant whether the weighted integrals of Levy’s stable distribution are theoretically divergent or not. The more important factor should be the statistical behavior in which most part of the total motions follows. For example, the price dynamics of a single stock can be considered as the Gaussian random walk, when a major part of the total motions is Gaussian.
3 Making Use of Scale Invariance of Levy’s Stable Distribution 3.1 Scale Invariance of Stable Distribution It is quite cumbersome to extract the value of α from the real data. Two methods are employed in Ref. [2] utilized the scale-invariance property of the stable distribution, defined as fα,β (Z) =
1 ∞ ikZ−β|k|α ∫ e dk 2π −∞
(5)
482
M. Tanaka-Yamawaki and M. Yamanaka α
This type of function has stability because of the kernel e−β|k| keeps the same parameter α in the process of convolution. Based on the convolution theorem, the kernel becomes a product of two kernels after convoluting n steps, that changes β to nβ and α remains unchanged. Assuming that the probability distribution P(Z) of price fluctuation Z is a stable distribution, then P(Z) has the scale invariance as follows: Pα, β (Z ) = (t)1/α Pα, βt (t)1/α Z
(6)
Also by setting Z = 0, 1 log(Pα, βt (0)) = − log(t) + log(Pα, β (0)) α
(7)
3.2 The First Method The first method uses the scaling property Eq. (6) directly by drawing many probability distributions for various time intervals. By setting t = 2n , a distribution P(Z) for a parameter 2n β can be made the same function by a scaled x-axis Z = cn Z and y-axis P = P/cn , where c = 21/α , if the parameter α is correctly chosen. The nine lines corresponding to the probability distribution Pα, βt (Z ) of price fluctuation Z for nine different time resolutions, t = 1 to 28 in the unit of 15 s are simultaneously plotted in Fig. 1, for the case of tseBIG index price data, taking the bin width of 0.01 JPY. Here the file named ‘15001.dat’ indicates Pα, βt (Z ) for t = 15 s, ‘15002.dat’ for t = 30 s, . . ., ‘15009.dat’ for t = 3, 840 s. In Fig. 2, those nine lines are shown to perfectly overlap a single line Pα, β (Z ) after the scaling transformation in Eq. (6) for the correct value of parameter α.
3.3 The Second Method The second method uses Eq. (7) to obtain the parameter α as the inverse of the slope of log(Pα, βt (0)) versus log (t) plotted in Fig. 3 for various time scales (t) as listed in Table 1. At first sight, those two methods are not independent. Indeed, both methods derive the same α if the scale invariance is properly held. If the two methods derive different α, something is wrong. Actually, the two methods are complementary to each other and can be used to support justice of the result.
Is the Statistical Property of the Arrowhead …
483
Fig. 1 The probability distribution of price change Pα, βt (Z ) for various time resolutions
Fig. 2 All the graphs in Fig. 1 overlap ‘15001.dat’ by choosing α correctly in Eq. (6)
4 Possible Source of Variations in α In our work of analyzing TSE stock index time series [5–7], we successfully clarified that α is definitely smaller than 2.0, but the numerical value of α varies in the range of 1.4 < α < 1.9.
484
M. Tanaka-Yamawaki and M. Yamanaka
Fig. 3 Plot of log P(0) versus log(t) in Table 1
Table 1 Peak height P(0) of the probability distribution versus the time resolution t [seconds] t
15
30
60
120
240
480
960
1920
3840
P(0)
2.97
1.99
1.33
0.898
0.603
0.410
0.285
0.184
0.122
Now the remaining question is the reason for a wide range of α, ranging from 1.4. to 1.9, depending on the date when the data are taken. From now on, we are going to focus on possible dependence of the parameter α on the time period that the data are taken, as well as the number of data points, and other technical factors in the process of drawing histograms from which the statistical distributions are extracted. However, before going any further, we recall some technical difficulty exist in drawing histograms as the start point. The first problem was the choice of parameters. Traditionally, log-return Z (t) = log X (t + t) − log X (t) is used as the main variable by taking the difference between log of the prices X(t) and the log of the next price X (t + t). This is a convenient method to kill the unit attached to the price X. We noticed that this is not necessarily a right idea in order to obtain accuracy in high-resolution result near the origin, Z = 0. Due to the minimum size of price change set by the market, the shape of the histogram depends on the choice of the number of partitions (thus the width of each partition, or ‘bin-width’) to draw the histogram. Instead, by choosing the net price increment, Z (t) = X (t + t) − X (t), and the minimum bin-width to be exactly the same as the minimum price increment, the abovementioned uncertainty due to the dependence on the ‘bin-width’ can be greatly reduced.
Is the Statistical Property of the Arrowhead …
485
Table 2 Parameter α obtained by the scaling in Eq. (6) and the slope of Eq. (7) Time period
Data size
Bin001–Bin013
April 2005–December 2018 (all data)
3,834,000
1.70–1.80
January 2010–August 2015 (first term of arrowhead)
1,564,000
1.70–1.80
January 2012–December 2018 (Late arrowhead)
1,848,000
1.54–1.70
897,000
1.57–1.65
September 2015–December 2018 (second term of arrowhead)
5 Time Dependence of the Statistical Properties There are still some remaining problems to obtain the best-fitted parameters. The most annoying problem is the choice of bin width in a histogram. As the bin width changes from 1 to 13, the best-fitted value of parameter α changes by 10 percent. Another problem is the choice of data size. Obviously, the longer data assures better accuracy. On the other hand, the economic situation of the target market changes after a long time. Therefore, the length of data must be chosen large enough to reduce the statistical error but not too long in order to maintain the stability of the market condition. Although more works are necessary before the final conclusion, the result of analyzing one index named Tosho Big Company Stock index per 15 s time series of a total of 3.834 million data points from April 2005 to December 2018 is summarized in Table 2. This work has clarified several features of price fluctuation in the arrowhead stock market. 1. 2. 3. 4. 5.
The distribution of price fluctuation of the index data is not Gaussian (α < 2). The value of α is not universal but varies with time. The value of α is not strongly influenced by the data size. The value of α depends on the bin width of the histogram but only slightly. The value of α is smaller in the recent arrowhead market compared to the old time.
6 Conclusion and Future Works In this paper, we attempt to answer to a long-standing question of statistical property of price fluctuation, whether it is the Gaussian random walk predicted by Bashelier in 1900, or non-Gaussian walk with mutual correlation among price movements as proposed by Mantegna and Stanley. Assuming that the process is described by i.i.d., i.e., independent, identical, distribution, the probability distribution of the price fluctuation can be expressed by Levy’s stable distribution. If it is Gaussian, described by the normal distribution, the parameter of Levy’s stable distribution is α = 2, otherwise, α < 2. After an extensive numerical and statistical analysis of a huge amount of price time series downloaded from the JPX site, we have reached a conclusion shown in
486
M. Tanaka-Yamawaki and M. Yamanaka
Table 2. Namely, the parameter α for a stock index is found to be in the range of 1.5–1.8, definitely smaller than 2, but varies by time. The extracted value of the parameter for the total data set from 2005 to 2018 is α = 1.7 for the case of minimum bin width 0.01, and α = 1.8 for the bin width 0.13. Also, the extracted parameter α becomes smaller in more recent data compared to the case of older data. As time goes on, more and more data will be accumulated that should give us fruitful information about price dynamics.
References 1. Bachelier, J.B.: Theorie de la Speculation. Annales scientifiques de l’Ecole Normale Superieure, Ser. 3(17), 21–86 (1900) 2. Mantegna, R.N., Stanley, H.E.: Scaling behaviour in the dynamics of an economic index. Nature 376, 46–49 (1995) 3. Bouchaud, J. P., Potters, M.: Theory of Financial Risks: From Statistical Physics to Risk Management. Cambridge University Press (2000) 4. JPX Cloud Homepage. http://www.jpx.co.jp/corporate/news-releases/0060/20150924-01.html 5. Tanaka-Yamawaki, M.: Statistical distribution of the arrowhead price fluctuation. Procedia Comput. Sci. 112, 1439–1447 (2017) 6. Tanaka-Yamawaki, M., Yamanaka, M., Ikura, Y.S.: Statistical distribution of the sub-second price fluctuation in the latest arrowhead market. Procedia Comput. Sci. 126, 1029–1036 (2018) 7. Tanaka-Yamawaki, M., Yamanaka, M.: Market efficiency is truly enhanced in sub-second trading market. Procedia Comput. Sci. 159, 544–551 (2019) 8. Black, F., Sholes, M.: The pricing of options and corporate liabilities. J. Politi. Econ. 81, 637–654 (1973)
Decision-Making Theory for Economics
History of Mechanization and Organizational Change in the Life Insurance Industry in Japan (Examples from Dai-ichi Life, Nippon Life, Imperial Life, Meiji Life) Shunei Norikumo Abstract This research explores the history of managed machines in the life insurance business in Japan and explores how the technological innovations of the times have changed and impacted organizational and human resource development. The purpose of this research is to derive clues on how organizations and people should adapt to the next generation of innovations. The period about a century ago will be focused on. This was a time when the world’s political, economic, and financial conditions were worsening, and major Japanese life insurance companies created the mechanization of management which was to become the foundation of today’s information technology. The research explores the organizational decisions that were made before and during the war, when the supply of parts faced difficulty.
1 Introduction This paper goes back to the history of business mechanization in Japan and reveals how the life insurance industry, which was active in the mechanization process at that time, introduced new technologies and brought about organizational changes from the viewpoint of management information theory. It aims to derive clues for organizations and human resources to adapt to the next generation of technological innovation [1]. Similar basic research on management mechanization was conducted by [2, 3], etc., but in recent management information research fields, computerization rather than mechanization of management is generally believed to have led substantial rationalization of management. Due to the impact on society, computerization after MIS (Management Information System) was often the center of discussion [4, 5], and the PCS (Punch Card System) era was less often taken up. In this study, it is hypothesized that an organizational base for accepting the next generation of information technology was built only in the mechanized era before S. Norikumo (B) Osaka University of Commerce, 577-8505 Osaka, Japan e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_42
489
490
S. Norikumo
this management rationalization was realized. This did not deal with data after the Internet emerged in 1980, which is a topic for later study.
2 Establishment of Life Insurance Companies and Management Mechanization The beginning of the insurance system began with the Industrial Revolution in the UK. There were two factors for this to arise: “(1) Factory workers with capital acquired through their own labor appeared in large numbers.” This population was concentrated in urban areas, and “(2) the economic unit of the family became smaller.” This social situation heightened the need for personal life protection, and the modern life insurance concept matured. As a result, the first to meet the needs was a mutual aid organization, a fraternity union. The Fraternity Union was a mutual aid organization for lower-class citizens and workers that emerged in the UK in the mid-seventeenth century. Regardless of occupation, religion, etc., monetary benefits were provided for deaths and funerals. The fraternity union originated from when people gathering at churches and public houses in the UK passed their hats around, put money in the hats, and saved it in order to make up for the cost of living for the survivors after their relatives died. Thus, in this fraternity union, we can see the origin of modern insurance companies. Initially, the fraternity union adopted a pay-as-you-go formula, with no mathematical backing, and did not set an age-based premium. As a result, the aging of the members led to a shortage of funds, and the youth were also dissatisfied. Later, as the modern life insurance business flourished, scientific actuarial systems were introduced.
2.1 Life Insurance and Management Mechanization In 1693, Halley wrote the world’s first “life table,” stating that “the price of insurance should be set according to the age of the person who is insured.” In 1706, the Amicable Society was founded and became long-lived as the world’s first life insurance company. In 1762, Equitable Society was founded to start its life insurance business on the scientific basis of the former. Equitable-added premiums such as lifetime insurance for long-term contracts, maximum limits to insurance payouts, surrender values, policyholder dividend payments, and medical examinations in 1858, in addition to pay-per-use term insurance. It secured the reasonableness of insurance claims and fairness among subscribers, and so laid the foundation for the modern life insurance business. In the 1860 s, a thrift movement developed in the UK, which began to encourage the working classes to save money. In the late nineteenth century, a market for
History of Mechanization and Organizational Change in …
491
these savings banks and worker life insurance companies was formed, setting a prototype for large worker life insurance companies. In 1856, Prudential was founded in London, and 20 years later, American Prudential was established in New York. In 1870, the Life Insurance Company Act was enacted, stabilizing the business foundation of life insurance companies. Worker life insurance was a very different business from middle-class life insurance. Most working-class life insurance was put in place to provide burial costs to avoid the shame of being buried in low-income communal graveyards, with a typical claim of around £ 10 at the time of death. However, low-income earners were incapable of paying annual premiums, and they only paid at the rate of one to two pence each week. From a data processing standpoint, the problem that Prudential had is that you need to collect premiums 50 times longer for one-tenth of insurance money than a regular life insurance company like Equitable Society does. Equitable could not handle working-class insurance at all, because the paperwork costs outweighed the premium income. Prudential’s business has been dramatically successful, with over 20 million contracts completed and over 1 million new contracts annually by 300 clerks in its 20th year. When comparing information processing in the nineteenth and twentieth centuries, information processing in the nineteenth century did not use office equipment at all. Now many clerks use computers for all routine work [20].
2.2 The Japanese Life Insurance Industry In 1868, Yukichi Fukuzawa introduced the modern insurance system (nonlife insurance and life insurance) in Japan as one of the cultures of Europe and the United States in his book “Western Travel Guide.” In 1880, Nichito Hosei (the first life insurance company in Japan) was founded by Giichi Wakayama and others who were members of the Iwakura Mission, but it went bankrupt. In July 1881, Meiji Life Insurance Company (the oldest surviving insurance company in Japan) was started by Taizo Abe under Yukichi Fukuzawa. In 1888, Teikoku Life (now Asahi Life) was founded as the second insurance company in Japan, and the third, Nippon Life, began operating in 1889. The history of life insurance in Japan was based on mutual assistance, such as that of tanomoshi-kou, nenbutusu-kou, and mujin-kou. It is believed that Yukichi Fukuzawa first introduced “insurance” in “Western Travel Guide” in 1867. He translates life insurance as “contracting a person’s life” and introduces it as such. In 1880, Kyosai 500 companies (now Meiji Yasuda Seimei) were established, but their management stalled. In 1881, a student of Yukichi Fukuzawa founded Meiji Life, Japan’s first life insurance company. Subsequently, Imperial Life (now Asahi Life) was established in 1888, Nippon Life in 1889 [14], Dai-ichi Life in 1902 as Japan’s first mutual company, and Chiyoda Life in 1904. Japan experienced the Sino-Japanese War in 1894, the enactment of the Insurance Business Law in 1900, the start of simple insurance in 1916, and after the Spanish influenza pandemic in 1918 and the Great Kanto Earthquake in 1923 occurred, a
492
S. Norikumo
large amount of insurance premiums were paid out. This was an unwelcome surprise for the Japanese insurance companies. This caused the insurance contracts to be transferred to other companies, and many insurance companies closed or merged. Next, let us look at the background of PCS introduction, which these life insurance companies advanced the mechanization of.
3 History of Management Mechanization in Japan PCS was first introduced in Japan in 1892. In the Statistical Journal No. 129, edited by Tokyo Statistical Association, Hollerith-type statistical machine was introduced as “Invention of an Electric Machine for Population Survey” by Jiro Takahashi. Following the US census, the Meiji government began field visits, and started developing national statistics machines, and then enacted a national census law. In 1905, engineer Taro Kawaguchi prototyped the “Kawaguchi-type electric tallying machine,” but the plan was canceled due to an earthquake. After that, the first step in the mechanization of management in Japan began, relying on imports from trading companies such as Mitsui & Co. (Mr. Kanzaburo Yoshizawa) and Morimura Shoji (Mr. Hiroshi Mizuna) [15–18].
4 History of Life Insurance Business and Mechanization In this chapter, we explore how life insurance companies have emerged and the industry has been the first to introduce a new information technology, the statistical tabulating machine. In addition, the history of each company’s introduction, usage, and changes in the organization will be explored. The life insurance companies surveyed are Nippon Life, Dai-ichi Life, Asahi Life, and Meiji Life.
4.1 Nippon Life Insurance Nippon Life, since early 1897, had achieved effective results in office efficiency using the Tates calculator for actuarial calculations and subscriber calculations. As for the history of its introduction, it was brought back by vice president Mr. Naoki Kataoka during a tour of the UK. After that, the aggressive replacement of calculators progressed [6–10]. In 1916, the Swiss-made Millionaire multiplication calculator was introduced and was used to calculate the policy reserve. Also, Mercedes, Line metal, Monroe Merchant, Alice Moss, Sunstrand, and others were introduced. In 1925, Mr. Tadashi Morita who was the chief director general introduced the Powers statistical machine to Mitsui after visiting the US and Europe. The following year, it was to be used for accounting, foreign affairs, and medical affairs department.
History of Mechanization and Organizational Change in …
493
After that, statistical machines derived from the Powers-type statistical machine of 1925 came into widespread use. Initially, they were used for mathematical statistics. Later, they were used for practical aspects of management statistics, recruitment expenses, continuing commission calculation, sequential budget, foreign affairs, medical department, and dramatic improvements in efficiency. In the company history book “Shayu” in June 1926, the policy to change human labor from administrative work, management research and analysis was written. Later, in 1934, the Hollerith-type statistical machine IBM405 was introduced (from the Powers type to the Hollerith type). In the same year, a Burroughs statistical machine was introduced, and the creation of premium payment guidance was started. In the corporate magazine “Shayu” in June 1926, the policy to change human labor from administrative work, management research, and analysis was written. Later, in 1932, the Hollerith statistical machine was introduced, Later, in 1934, the Hollerith-type statistical machine IBM405 was introduced (from Powers type to Hollerith type). In the same year, a Burroughs statistical machine was introduced, and the creation of insurance payment guidance was started. The main uses in insurance companies are the calculation of policy reserves and contractor dividend reserves, and the preparation of various statistical tables prescribed in the Insurance Business Law, and the calculation of dividend incomes for policyholders. Creation of reference statistical tables through company experience, preparation of statistics on foreign affairs efficiency and other performance figures, preparation of statistics for medical professionals, guidance for insurance premium payments, simultaneous creation of insurance receipts and insurance premium income statements, employee salaries and receipts, special actuarial investigations, and other large numbers of numerical investigations. Since its inception, the organization reorganized its work system in 1919 when Mr. Naoharu Kataoka, who had served in management for about 30 years, announced entry into the political world. The paymaster, finance, and accounting divisions were established. In 1929, when he became the third president of the company, he expanded 14 divisions to 26, including accounting and finance divisions. In 1939, the accounting division was enriched with seven sections, and in 1944, the statistics division was established. In 1954, the “three-year recovery plan to the pre-war level” was launched to promote the improvement of management efficiency. At the Board of Directors’ meeting on October 11, 1955, the company stated that “(2) We will study for the introduction of large-scale computers to rationalize administrative work, lower maintenance costs, and develop management documents.” In 1959, we introduced the IBM 7070 computer. In 1957, we established a foreign education system. Since 1951, the education of managers has been conducted. From 1953, the company expanded the education of internal staff. In 1956 they issued a management rationalization 5-year plan. It states “(2) Expansion of education for both administrative staff and sales staff”, “(3) Improvement of internal affairs on the premise of introducing large-scale computers,” and furthermore, “(3)–(2) As immediate preparation of the system for accepting large computers, we will endeavor to enter normal operation within the planning period.” It can be
494
S. Norikumo
said that this is strongly promoting the culture of information collection throughout the organization. In the company history, “After 1929, affairs became accurate quickly, and as a result, it gradually succeeded in replacing human power with machines.” Also, we see the following sentence in the journal of IBM Japan’s 50th anniversary: “Nippon Life has developed to become the largest life insurance company in Japan throughout the Taisho era and the first Showa era, as well as being a pioneering company in mechanizing office work.”
4.2 Dai-ichi Life Insurance Dai-ichi Life introduced the Millionaire Calculator, which won an award at the Paris International Exposition in 1903, for making office work more mechanized [13]. In 1925, the Powers-type classification computer was introduced. In November 1938, 68 Hollerith-type statistical machines were installed, and statistical work was mechanized (from Powers-type to Hollerith-type). In April 1953, the mechanization of new contract affairs (statistics of new contracts, creation of ledger cards, expenses of branch offices, recruitment results of outside employees, creation of payroll calculation documents) were implemented. In October 1955, he created a payment guide for annual and semi-annual payment contracts and automated his affairs. Instead of “tiles” that had been used for several decades, PCs (Premium Cards) and ACs (Address Cards) were created to provide transfer guidance in kana characters. Since then, it has been widely used for dividend reserve cards, policyholder loan cards, and fixture cards. In 1961, the IBM1401 was introduced and converted to EDPS (from PCS to computer). In terms of management, in October 1964 the Basic Operational System Committee was established. In March 1965, 14 conditions for the current business administration basic system were established. Then, in July, the first system conversion plan was reported and implemented as a 3-year plan until March 1968. This specifies the following measures: (1) Reorganize clerical workgroups that have been mechanized functionally according to the division of affairs into one unified system, aiming to save labor and shorten work schedules. (2) Promote rationalization of office work that has not yet been mechanized under this system. (3) Actively promote filing with magnetic tape, abolish visible cards, and greatly reduce manual labor. The necessary forms are converted to kana characters.
4.3 Asahi Mutual Life Insurance (Former Imperial Life) Imperial became the later Asahi mutual life Insurance company. In 1901 it introduced a card system, which was used as a statistics card, a collection card, and three of the Type-B loan card [12]. In 1910, it was used as a Bill of Loan Book Card, Deposit
History of Mechanization and Organizational Change in …
495
Card, Employee Identity Guarantee Reserve Fund Card, Employee Prepaid Payment Card. And the usage increased year by year. The reason that the company introduced PCS was Mr. Yoshinobu Fukuhara visited a New York company and a mutual company in the United States. He learned in the field that PCS was being used to speed up paperwork processing. In 1904, he had adopted a large number of female clerks, and entered count books and calculation cards and the like. In 1934, the company introduced a set of Hollerith-type PCS perforators, inspection machines, sorting machines, collators and accounting machines, to be applied to the settlement of statistical and mathematical department work and other office work. It was also used for medical statistics from 1937, and all of the statistical work relied on IBM to improve its efficiency. Prior to the opening of the Pacific War in 1941, the cabinet office offered to use IBM’s statistical machines, thereafter they were commandeered by the Navy, for operation of equipment and contract work. This restricted the use of the machines at the company. In 1966, three guidelines for promoting management were issued. Its contents are Part 2, “Innovation of the administrative system”, Part 3, “Development of human resources” and so on. In-house research was conducted in 1927 in the “Imperial Life Working Study Group”, and in 1930, “Rationalization and cooperation in insurance management” was published. In 1938, “Statistical affairs of Hollerith-type machines and life insurance companies (1), (2)” were published. In 1939, “Statistics affairs of Hollerith-type machines and life insurance companies (3)” was published. The organization established the “Statistics Department” in 1899. The “Statistics Division” was established in 1904. In 1969, the “Computer Center Planning Office, Machine Room” was established. The “Computer Center System Department, Computer Center Office Administration Department” was established in 1970. In 1971, the “System Department, Office Administration Department, Information System Development Office” was established.
4.4 Meiji Life Insurance Meiji Life created Japan’s first life table (Experience table) in 1905, introduced one set of Powers-type statistical machines in 1932 [11], implemented the monthly payment insurance collection system (Debit system) in 1950.The IBM Statistical accounting machine was introduced in 1954, the IBM 1401 type computer was introduced in 1962, the IBM 1418 type optical character reader (OCR machine) introduced in 1964. Introduction of IBM7044 type electronic computer in 1965, OCR of monthly fee administration, the introduction of copy-flow system in 1966, introduced IBM computer 360-type 40 computer in 1968, introduced data cell and video inquiry device, implemented business online processing in 1969, introduced kanji system JEM3100S type computer in 1971.
496
S. Norikumo
The organization was reorganized in June 1917, when Taizo Abe, who had been in the company’s top management since its founding, resigned and Heigoro Sota took over. The Audit Department is roughly divided into three departments: General Affairs, Sales, and Audit. In 1955, a monthly administrative office was set up to streamline monthly insurance operations, and out-of-office payments, monthly fees, employee salaries, etc., are mechanized. This approach went very effective, with its speed, accuracy, and reduced administrative expenses. Later, when OCR was introduced in 1964, the biggest challenge was the mechanization of storage affairs. In 1957, the organization was expanded to a 14-part system. The Actuarial Division became three parts, the Actuarial Division, Statistics Division, and Insurance Actuarial Division. In 1957, the IBM Statistical Accounting Machine was added, and the Monthly Fee Department was set up in 1958 to collectively manage the storage of monthly premiums. However, because of the rapid increase in the office works even with mechanization, the company introduced the IBM1401 in 1962 to streamline the operation.
5 Consideration The following is what we have learned from a comparison of the four insurance companies. Why were the life insurance businesses successfully able to introduce management mechanization so early? The first thing in the life insurance business that required a calculator was the creation of a life table for the early stages of statistical and mathematical calculations found in Nippon Life and Dai-ichi Life. Second, insurance was originally a lump-sum premium-type product for the upper-middle class, but insurance has spread in large numbers as a product for low-income earners and the general public. Such paperwork for storing the fee information requires mechanization. As the third reason, Nippon Life, Dai-ichi Life, and Meiji Life were promoting the mechanization of office work in line with the rapid increase in subscribers, which requires the printing of the addresses of very large numbers of subscribers. These three factors are the main reasons why mechanization penetrated in an early stage. Meiji Life is the oldest insurance company in Japan, late in the introduction of mechanization, but since around 1950, the shortage of human resources due to the storage of monthly charges has been rationalized by mechanization. Asahi Life Insurance Co., Ltd., is the second oldest and was working to streamline affairs by early introduction of the card system, and research on management mechanization was active. Nippon Life is the third oldest, and introduced a calculator early and used it for calculating life tables. The utility is linked to the mechanization by statistical machines. It was the first company in Japan to switch from a Powers-type statistical machine to a Hollerith-type (IBM) model as early as possible in Japan, leading the mechanization and computerization of Japan. Dai-ichi Life was also following Nippon Life to create a life table and streamline clerical work by introducing a computerized system, but there are few records on mechanization and computerization, and the detailed situation is unknown.
History of Mechanization and Organizational Change in …
497
In addition, as shown in the previous research, the organizations tend to introduce the systems based on prior decisions such as careful field inspections (in Japan and the United States), which were requested by the person in charge of the organization from top management’s decision. Also, research on office efficiency and rationalization within an organization is active. The top management policy is requested to be communicated to some related departments, and furthermore, the information policy and decision-making are quickly and clearly communicated to employees through management plans, organizational restructuring, and internal documents.
6 Conclusion I would add that there are some points to note in this paper. The four companies compared were the pioneering Japanese life insurance companies that had introduced statistical machines. As for mechanization and organizational change, the author has picked up ones that are closely related based on the author’s judgment. However, there were no records, for example, those that might be found from the literature and records related to technology and product development depending on the company history, and there were only a few records on internal matters. It is important to know about the organizational changes that introduced unknown new technologies, in order to predict the future introduction of AI into the world industry [19]. In addition, the results of this paper are a study of the history of management mechanization at four companies, and the same case does not necessarily apply to other industries and other companies. The basis of an information system in a company is an information system that is difficult to imitate, so that it can strategically overcome the management of other companies. The important idea is that even if the technology advances, it will not be a system that relies solely on information technology, but rather the information processing activities performed by organizations and human resources that will round off a company’s own original information system. Acknowledgements This research is part of the research results that received the “Management Science Research” award from the Japan Business Association in 2019. I would like to express my gratitude for the support of the foundation.
References 1. Turing, A.M.: Computing machinery and intelligence. Mind 49, 433–460 (1950) 2. Kishimoto, E.: Management and Technology Innovation. Nihon Keizai Shimbun (1959) 3. Beika, M.: The History of Japanese Management Mechanization, pp 3–57. Japan Management Publications Association (1975) 4. Toyama, A., Murata, K., Kishi, M.: Management Information Theory, pp. 11–226. Yuuhakaku (2015)
498
S. Norikumo
5. Information Processing Society History Special Committee.: History of Computer Development in Japan. Ohmsha (2010) 6. Nippon Life Insurance Company.: Nippon Life 70 Years History (1962) 7. Nippon Life Insurance Company.: Nippon Life 100 Years History, vol. 1 (1992) 8. Nippon Life Insurance Company.: Nippon Life 100 Years History, vol. 2 (1992) 9. Nippon Life Insurance Company.: Nihon Life 100 Years History Material Edition (1992) 10. Nippon Univac Co Ltd.: History of UNIVAC 30 Years, Diamond Company (1988) 11. Meiji Life Insurance Company.: 110 Years History, vol. 1 (1993) 12. Asahi Life Insurance Company.: Asahi Life 100 Years History (1990) 13. Dai-ichi Life Insurance Company.: Dai-ichi Life 70 Years History (1972) 14. Nissay Research Institute.: Overview-Life Insurance in Japan (2011) 15. Management Information Society: Total Development History of IT Systems of Tomorrow. Senshu University Publishing Bureau (2010) 16. Japan Business History Research Institute: IBM 50 Years History. Japan IBM Corporation (1988) 17. Japan Business History Research Institute: History of Computer Development Focusing on IBM. Japan IBM Corporation (1988) 18. Japan Business History Research Institute: Year of Information Processing Industry. Japan IBM Corporation (1988) 19. Norikumo, S.: Consideration on substitution by artificial intelligence and adaptation of organization structure: -From the history of management information systems theory. jsai2018, vol. 32, 2J1-05. The Japanese Society for Artificial Intelligence (2018) 20. Yamamoto, K. Canbel kelly. M, Asplay, W.: Computer 200 years history: Kaibundo (1999)
A Diagram for Finding Nash Equilibria in Two-Players Strategic Form Games Takafumi Mizuno
Abstract Drawing a diagram can make people find Nash equilibria of two-players strategic form games. The diagram consists of nodes, directed edges, and undirected edges. Nodes represent outcomes of a game, and edges represent responses of players against strategies of their opponents. Pure Nash equilibria are represented as nodes without outgoing directed edges. A mixed Nash equilibrium is represented as a node on an intersection of two undirected edges that are synthesized from directed edges. In this article, I describe a way to find such equilibria by using the diagram. Since games are represented by directed edges, games can be specified by pairwise comparisons. Then the preference of strategies may not be any longer a total order. I also provide two games in which there are cycles of preference in a player’s choices. One game has a mixed Nash equilibrium, and another does not have.
1 Introduction: Strategic Form and Nash Equilibrium This study deals with two-players games represented in strategic form [1]. Players 1 and 2 have sets of their strategies M = {1, 2, ..., m} and N = {m + 1, m + 2, ..., m + n}, respectively, and each player chooses simultaneously a strategy from his/her set. There are m × n possible outcomes made by their choices. When player 1 chooses strategy i and player 2 chooses strategy j, player 1 receives payoff ai j and player 2 receives payoff bi j . The payoffs are arranged into m × n matrices A = (ai j ) and B = (bi j ), where i ∈ M and j ∈ N . Notice that, in this article, row indices and column indices of the matrices are denoted as elements i ∈ M and j ∈ N , respectively. We can regard to as player 1 chooses a row from matrix A, and player 2 chooses a column from B. We interest in a condition that every player chooses his/her best response against the opponent’s choice. An outcome that satisfies the condition is referred to as Nash equilibrium [1–3]. When each player must choose only one strategy, the equilibrium T. Mizuno (B) Meijo University, 4-102-9 Yada-Minami, Higashi-ku, Nagoya, Aichi, Japan e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_43
499
500
T. Mizuno
is called pure Nash equilibrium. While, when each player can choose some strategies probabilistically (random choice), the equilibrium is called mixed Nash equilibrium. In finding for pure Nash equilibria, a possible outcome is a pair of strategy s = (i, j), where i ∈ M and j ∈ N . A pair (i, j) is a pure Nash equilibrium if and only if following inequalities are satisfied. ai j ≥ ai j , i = i, i ∈ M, bi j ≥ bi j ,
j = j, j ∈ N .
(1) (2)
The condition says that each player cannot improve his/her payoff by changing the strategy if the opponent’s strategy is fixed. In finding for mixed Nash equilibria, a possible outcome is a pair of vectors , ..., x ), y = (y , ..., y ), x ≥ 0, y ≥ 0, (x, y), where x = (x 1 m m+1 m+n i j i x i = 1, and y = 1. The element x is a probability that the player 1 chooses strategy i, and the j i j strategy j. Payoffs of player 1 and 2 element yj is probability that player 2 chooses m m+n are calculated as expected payoffs u 1 (x, y) = i=1 j=m+1 x i y j ai j and u 2 (x, y) = m m+n i=1 j=m+1 x i y j bi j , respectively. A pair (x, y) is mixed Nash equilibrium if and only if following inequalities are satisfied. u 1 (x, y) ≥ u 1 (x , y), x = x,
u 2 (x, y) ≥ u 2 (x, y ),
y = y.
(3) (4)
Notice that the functions u 1 and u 2 are bilinear. For example, u 1 (x, λy + (1 − λ)y ) = λu 1 (x, y) + (1 − λ)u 1 (x, y ). In this article, I provide a diagram for finding the equilibria. How to construct the diagram is described in Sect. 2. Ways to find pure Nash equilibria and mixed Nash equilibria by using the diagram are described in Sects. 3 and 4 with some examples. In Sect. 5, I also mention the existence of equilibria of games in which pairwise comparisons specify the preference of strategies.
2 A Diagram to Find Nash Equilibria Nash equilibria of a game can be found by drawing a diagram that represents the game. The diagram consists of nodes, directed edges, and undirected edges. Each node s = i j represents a pair of strategies s = (i, j), and edges represent responses against the opponents’ strategies. For each player, edges are drawn by the following rules. – For each strategy of the opponent, if the payoff of an outcome s is higher than a payoff of s , then a directed edge is drawn from the node s to the node s. And a weight, how larger the payoff of s than the payoff of s , is put on the edge (Fig. 1).
A Diagram for Finding Nash Equilibria in Two-Players Strategic Form Games
501
Fig. 1 Directed edges
Fig. 2 Undirected edges
– For each strategy of the opponent, if payoffs of s and s are the same, then an undirected edge connects both nodes (Fig. 2).
3 For Pure Nash Equilibrium By how to construct the diagram, when a node has outgoing edges, a player can improve his/her payoff by changing his/her strategy. So, a node without outgoing edges represents a pure Nash equilibrium. Let us consider a game in which the following matrices show players’ payoffs. ⎡
0 A = ⎣2 3
⎤ 5 4⎦ , 2
⎡
5 B = ⎣4 5
⎤ 5 8⎦ . 3
(5)
Sets of strategies of the player 1 and 2 are M = {1, 2, 3} and N = {4, 5}, respectively. The game is drawn as the diagram in Fig. 3. The diagram can be decomposed to responses of player 1 (Fig. 4 left) and responses of player 2 (Fig. 4 right). For player 1, when player 2 chooses strategy 4, directed edges from 14 to 24, 24 to 34, and 14 to 34 are drawn because a34 > a24 > a14 . When player 2 chooses strategy 5, directed edges from 35 to 25, 25 to 15, and 35 to 15 are drawn. For player 2, when player 1 chooses strategy 1, an undirected edge
Fig. 3 The game of (5). Nodes 15 and 34 are pure Nash equilibria
502
T. Mizuno When player 2 chooses 4
When player 2 chooses 5
14
15 1
When player 1 chooses 1 14
15
2 24
3
25
3
25
2
1 34
4
When player 1 chooses 2 24
35
When player 1 chooses 3 34
2
35
Fig. 4 The responses of player 1 (left), and the responses of player 2 (right) Fig. 5 Synthesizing an undirected edge from directed edges
1
′
connects 14 and 15 because b14 = b15 . And directed edges from 24 to 25 and 35 to 34 are drawn. In Fig. 3, the node 15 and 34 have no outgoing edge; (1, 5) and (3, 4) are pure Nash equilibria.
4 For Mixed Nash Equilibrium In finding mixed Nash equilibria, each outcome is represented as a pair of probability vectors s = (x, y), and it is represented as a node s = x y in the diagram. Particularly, if player 1 chooses a strategy i and player 2 chooses strategies by showing a probability vector y, then I denote the node as i y. The idea to find a mixed Nash equilibrium is to synthesize an undirected edge from two directed edges, whose directions are different from each other. Figure 5 shows that player 1 prefers x y to x y when player 2 chooses y, and prefers x y to x y when player 2 chooses y . Now, let us consider a new strategy y = λy + (1 − λ)y that satisfies u 1 (x, y ) = u 1 (x , y ),
(6)
A Diagram for Finding Nash Equilibria in Two-Players Strategic Form Games
503
or u 1 (x, λy + (1 − λ)y ) − u 1 (x , λy + (1 − λ)y ) = 0.
(7)
When player 2 chooses the strategy y , player 1 does not distinguish between x and x , and an undirected edges connects nodes x y and x y . The coefficient λ is determined to be λ=
w u 1 (x , y ) − u 1 (x, y ) = . (u 1 (x, y) − u 1 (x , y)) + (u 1 (x , y ) − u 1 (x, y )) w + w
(8)
For example, there is a game with the following payoff matrices. A=
20 , 13
B=
12 . 40
(9)
Sets of strategies of the player 1 and 2 are M = {1, 2} and N = {3, 4}, respectively. All nodes have outgoing edges (Fig. 6); there is no pure Nash equilibrium. The directed edge from 23 to 13 is the response of player 1 against mixed strategy y = (y3 , y4 ) = (1, 0) of player 2, and the edge from 14 to 24 is the response against y = (0, 1). Now, we try changing y from (1, 0) to (0, 1) gradually (Fig. 7 left). By (8), if y3 = 3/4 = w4 /(w3 + w4 ), payoffs of player 1 are the same for every x. On the other hand, if x1 = 4/5 = w2 /(w1 + w2 ), payoffs of player 2 are the same for every y (Fig. 7 right). So, two undirected edges are drawn, and the node x y on the intersection of them represents the mixed Nash equilibrium (x, y) = ((4/5, 1/5), (3/4, 1/4)) (Fig. 8). Notice that if player 1 prefers i j to i j, then the player prefers i j to any x j between i j and i j (see edges from x3 to 13, 1y to 14, x4 to 24, and 2y to 23 in Fig. 8). And, in an observation, synthesizing an undirected edge from directed edges needs at least two of them that have different directions to each other. Let us find mixed Nash equilibria of the previous game of (5). By changing y = (y4 , y5 ) from (1, 0) to (0, 1) gradually, undirected edges appear when y4 = 2/3, 1/2, and 1/3; five candidates for Nash equilibria are I, II, III, IV, and V in Fig. 9. Then I add responses of player 2 to the candidates (Fig. 10). At the candidate I, the node 34 has no outgoing edge; the outcome (x, y) = ((0, 0, 1), (1, 0)) is pure and mixed Nash equilibrium. At the candidate II, if changing x from (0, 1, 0) to (0, 0, 1) makes an undirected edge, then the intersection be the mixed Nash equilibrium. When
Fig. 6 The game of (9). All nodes have one outgoing edge
504
T. Mizuno
Fig. 7 Changing y3 form 1 to 0 changes responses of player 1 (left). Changing x1 from 1 to 0 changes responses of player 2 (right)
Fig. 8 The node x y has no outgoing edge, where x = (4/5, 1/5) and y = (3/4, 1/4)
Fig. 9 Changing y4 from 1 to 0
x2 = 1/3, the mixed Nash equilibrium is (x, y) = ((0, 1/3, 2/3), (2/3, 1/3)). Notice that the probability of a strategy whose node have outgoing edges is zero because the expected payoff increases by increasing the probabilities of other preferred strategies. So, x1 = 0 in the candidate II. At the candidate III, the node 2y have the outgoing edge to 25; the candidate is not equilibrium. At candidate IV and V, both of nodes 1y and 15 do not have outgoing edges. Furthermore, with keeping x1 = 1, any node between 1y and 15 has no outgoing edges; nodes (x, y) = ((1, 0, 0), (y, 1 − y)) satisfying 1/3 ≥ y4 ≥ 0 is mixed Nash equilibrium.
A Diagram for Finding Nash Equilibria in Two-Players Strategic Form Games
505
Fig. 10 Adding responses of player 1 to candidates in Fig. 9
5 Discussion and Future Works For games in which one player has at most two strategies, people can find equilibria by drawing diagrams as I did. But, increasing the number of strategies of both players makes it complicated. There is a game whose players have three strategies: M = {1, 2, 3} and N = {4, 5, 6}. The payoffs are given as ⎡
⎤ 330 A = ⎣4 0 1⎦ , 045
⎡
⎤ 340 B = ⎣3 0 4 ⎦ . 015
(10)
The game is described in [1]. By the symmetry (A = B ), both players have the same vector strategies; x = y in each equilibrium. Figure 11 shows the diagram and all equilibria of the game. The node 36 has no outgoing edge; (3, 6) is pure Nash equilibrium. The candidate y = (3/4, 1/4, 0) is found by changing y from (1, 0, 0) to (0, 1, 0), and the candidate y = (1/2, 1/4, 1/4) is found by combinating y and y = (0, 1/4, 3/4). Both candidates (y, y) and (y , y ) are mixed Nash equilibria. In [1], the authors give the above game as an example of applying the Lemke– Howson (LH) algorithm [4, 5] to find the equilibria. The algorithm finds equilibria by moving a point through vertices and edges on polyhedra that represent possible vectors of mixed strategies [6]. Each path moving from a start vertex to an end finds one equilibrium. Visualizing the polyhedra and seeing how the algorithm works are complicated. While, in my approach, we can handwrite all equilibria on a sheet.
506
T. Mizuno
Fig. 11 The game of (10) in [1]
When a player of a game has two or more best pure responses against the opponent’s pure strategy, the game is called degenerate. The game of (5) is degenerate because player 2 can choose strategies 4 and 5 against the strategy 1. My approach does not need to consider whether the game is degenerate or is not. In strategic form games, there is an assumption that the preference for strategies has the total order. The values of payoffs are given by bimatrix (A, B). But my approach does not need the assumption. Which outcome each player prefers can be specified by pairwise comparisons among outcomes or by drawing arrows among nodes. For example, let us consider a game represented in the diagram in Fig. 12. It is a modified game from (5). Directions of edges from 34 to 14 and from 15 to 35 are different to edges in Fig. 3. Cycles of preference occur in choices of player 1, and bimatrices cannot give the preference. Since all nodes have outgoing edges, the game has no pure Nash equilibrium. While, at y = (y4 , y5 ) where 1/3 ≥ y4 ≥ 1/4, the node 1y has no outgoing edges; mixed Nash equilibria are (x, y) = ((1, 0, 0), (y4 , 1 − y4 )) where 1/3 ≥ y4 ≥ 1/4 (Fig. 13). Let us consider a new game represented in Fig. 14.
A Diagram for Finding Nash Equilibria in Two-Players Strategic Form Games
507
Fig. 12 Cycles of preference occur in choices of player 1
Fig. 13 Changing y4 from 1 to 0 in the game Fig. 12 Fig. 14 The game that has no Nash equilibrium
In the game, responses of player 1 against the strategy 5 are copies of responses against the strategy 4. We cannot make any undirected edges by changing y. So, the game does not have pure Nash equilibria and also mixed Nash equilibria. A game whose payoffs are given by bimatrices has at least one mixed Nash equilibrium [3]. However, games in which pairwise comparisons rank strategies may have no mixed Nash equilibrium. Studying how to choose strategies in games such as the previous example is my future work.
508
T. Mizuno
References 1. Nisan, N., Roughgarden, T., Tardos, E., Vazirani, V.V.: Algorithmic Game Theory. Cambridge University Press (2007) 2. Nash, J.: Equilibrium points in n-person games. Proc. Natl. Acad. Sci. 36(1), 48–49 (1950) 3. Nash, J.: Noncooperative games. Ann. Math. 54, 289–295 (1951) 4. Lemke, C.E., Howson, J.T.: Equilibrium points of bimatrix games. J. SIAM 12, 413–423 (1964) 5. Shapley, L.S.: A note on the Lemke-Howson algorithm. In: Mathematical Programming Study 1: Pivoting and Extentions, pp.175–189 (1974) 6. Ziegler, G.M.: Lectures on Polytopes. Springer, New York (1995)
An AHP Approach for Transform Selection in DHCT-Based Image Compression Masaki Morita, Yuto Kimura, and Katsu Yamatani
Abstract In this paper, we discuss an Analytic Hierarchy Process (AHP) approach to high-resolution image compression. Generally, in image compression, it is necessary to remove redundant components in order to achieve significant data compression. Therefore, it is most important to minimize image distortion for a given bit rate. The computational cost is also important in the case of high-resolution image compression. The DHCT is a frequency transform using an orthogonal combination with the discrete cosine transform and Haar transform. Minimizing image distortion for a given bit rate and reducing the computational cost, we have already provided a DHCT-based compression method using p-norm for the transform selection. In this paper, we use the AHP method to determine the transform selection parameter p. The evaluation criteria used to formulate AHP utilizes the level of negative impact caused by the three types of noise-mosquito noise, jagged noise, and block noise. The computational cost is also required for the overall priority setting. Furthermore, numerical experiments using standard images demonstrate subjectively and objectively the superiority of the method over previous methods.
1 Introduction High-resolution image compression becomes important more and more due to the recent rapid progress of mobile network and display performance. In image compression, redundancy in images is removed by a frequency transform. Therefore, the frequency transform selection greatly influences compression performance. We have studied an image compression method based on the discrete cosine transform (DCT) and Haar transform(HT). The DCT is a frequency transform that is used in the most popular image compression scheme JPEG. HT is a wavelet transform which minimizes the computational cost for the image reconstruction. We have already provided a DHCT-based image compression method using p-norm for the transform M. Morita (B) · Y. Kimura · K. Yamatani Meijo University, Minami-Yada, 4-102-9, Higashi-ku, Nagoya, Japan e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_44
509
510
M. Morita et al.
selection [1]. The DHCT is a frequency transform using an orthogonal combination with the DCT and HT [2]. The appropriate setting of p is very important in order to improve the compression performance of the DHCT-based image compression using p-norm. However, we have not been able to determine the appropriate parameter p value, it only shows that there is an appropriate parameter p existed for each image, as seen from bit rate and computational cost after reconstruction at the same image quality. In actual image compression, compressing and reconstructing using several parameter p is not realistic because of the large computational cost. Moreover, for images other than the test images used in the previous study, the appropriate p for use is unclear. Hence, the parameter p must be determined in advance. In this paper, we use the AHP method [5, 6] to determine the transform selection parameter p in advance. In an AHP approach this paper proposes, input images are classified into three types subjectively: images wherein lots of gradation; images wherein lots of edge; and images wherein lots of texture. In the DHCT-based image compression, jagged noise caused by HT and mosquito noise caused by DCT are annoying for gradation and edge, respectively. In the high-compression images, block noise is also noticeable and annoying. Now, from the previous study, we have a relative evaluation of the negative impact caused by these noises for each image type. Using this empirical information, we create pairwise comparison matrices to compute the priority of parameter p. The computational cost of image reconstruction is also required for the overall priority setting. Furthermore, numerical experiments using standard images demonstrate subjectively and objectively the effectiveness of the AHP approach for DHCT-based high-resolution image compression.
2 Method for Determining Parameter p by AHP In this section, we describe how to evaluate the image quality and the computational cost of the reconstructed image obtained at parameter p using AHP, and how to determine an appropriate parameter p. First, DCT and HT used in DHCT are applied to the test image [3] in Fig. 1 to generate reconstructed images respectively. Then we express the level of noise on the reconstructed images in pairwise comparison matrices. Next, for each parameter p of DHCT, we subjectively evaluate the impact of noise by pairwise comparisons. After evaluating the computational cost for each parameter, the appropriate parameter p is determined.
2.1 Reference Noise-Weight Vector In pairwise comparisons for evaluating the noise, we compared three types of noticeable noises: mosquito noise, jagged noise, and block noise. These noises often appear when using DCT, HT, with a high-compression ratio, respectively [2]. The recon-
An AHP Approach for Transform Selection in DHCT-Based Image Compression
511
Fig. 1 Test image (4096 × 3072, 3072 × 4096, 8bit/pixel grayscale)
structed images have different type of noise in different levels. Therefore, we applied DCT and HT to the six types of test images, shown in Fig. 1, and evaluated the negative impact of their parameters for each noise. Prominent noise sections in the images are in Fig. 2. When we focus on (a) and (b), the jagged noise appears on the human skin and flower petal sections. In (d) and (f), mosquito noise occurs along the boundary section between the house and the sky and the boundary section between the cup and background. In (c) and (e), the pattern with a complex background and a yarn section is in a square block shape, and the pattern is crushed. They can be divided into three groups: images lots of gradation ((a) and (b)), lots of the edge ((d) and (f)), and lots of the texture ((c) and (e)). So, we evaluate the subjective intensity of three types of noise on the three groups by pairwise comparisons. The pairwise comparison matrices and weight vectors derived from these are in Table 1.
2.2 Weight Vector with Parameter p About Noise Next, we evaluate effects of the parameter p = 0.7, 0.4, 0.1 for three types of noise. We apply DHCT with p = 0.7, 0.4, 0.1 to the test images in Fig. 1 and compressed to the same bit rate. These p values are provided by the previous study [2]. Other values do not show differences.
512
M. Morita et al.
Fig. 2 Extracted image of prominent noise part Table 1 Pairwise comparison matrices concerning noise(mosquito noise, jagged noise, and block noise). w is rounded off to the four decimal place Lots of gradation Lots of edge Lots of texture M J B w M J B w M J B w M J B
1 5
1 5
1
5 2
1 2
2 5
2 1
0.118 1 0.588 17 0.294 13
7 1
3
7 3
1
3 7
0.677 1 0.097 23 0.226 3
2 3
1 2
1 3 1 2
1
0.182 0.273 0.546
M: mosquito noise, J: jagged noise, B: block noise
Here, to evaluate the impact of noise in reconstructed images, we show an example of experimental results related to Fig. 1d Pier in Fig. 3. Figure 3 shows an expanded section of the roof of the “Pier” image reconstructed by DHCT with a compression parameter p = 0.7, 0.4, 0.1 compressed to the same bit rate. Focusing on the white antenna section, we observe that the mosquito noise is lower in image (b) p = 0.4 than in image (a) p = 0.7. Furthermore, in image (c) with p = 0.1, the mosquito noise is reduced in not only the white antenna area but also the boundary between the roof section and the sky. However, the white antenna section of image (c) contains jagged noise that is absent at p = 0.7 and p = 0.4. Meanwhile, block noise was generated in the sky section at all compression parameters. The block noise was less noticeable at p = 0.7, but significantly crumbled the image at p = 0.1. The comparison matrices based on the above results are shown in Table 2. This figure shows the respective pairwise comparisons of the evaluations of mosquito noise, jagged noise, and block noise for each parameter p.
An AHP Approach for Transform Selection in DHCT-Based Image Compression
513
Fig. 3 Comparison of the parameter of influence of noise for each parameter p Table 2 Pairwise comparison matrices of each parameter p = 0.7, 0.4, 0.1 for noise. w is rounded off to the four decimal place Mosquito noise Jagged noise Block noise 0.7 0.4 0.1 w 0.7 0.4 0.1 w 0.7 0.4 0.1 w 0.7 0.4 0.1
1
3 7
7 3
1 3
7
1 7 1 3
1
0.097 1 0.226 1 0.677 15
1 1 1 5
5 5 1
0.455 1 0.455 21 0.091 41
2 1 1 2
4 2 1
0.571 0.286 0.143
2.3 Determining the Parameter p Using the weight vectors in Tables 1 and 2, we now determine the appropriate parameter p. First, we calculate the priority of each parameter p = 0.7, 0.4, 0.1 for each feature involved in the image. The priority vector g is the calculation result for image with lots of gradation, and e for image with lots of edge, t for image with lots of texture as follows: g = (0.447 0.378 0.175), e = (0.224 0.283 0.493), t = (0.453 0.321 0.226).
(1)
Based on the experimental result in the previous study [2], we set the priority regarding the computational cost for image reconstruction to be directly proportional to p. For each p = 0.7, 0.4, 0.1, the priority vector is c = 0.167 0.333 0.500 . We use weighted averages
(2)
514
M. Morita et al.
Fig. 4 Overall priority in parameter p calculated for each image feature
x g = αg + (1 − α)c, x e = αe + (1 − α)c, x t = αt + (1 − α)c
(3)
as the overall priorities for selecting p. Considering the range of α as 0 ≤ α ≤ 1 and assigning more importance to image qualities close to 1. Figure 4 shows the values of x for α = 0, 0.2, 0.4, 0.6, 0.8, 1.0. As shown in Fig. 4, when gradation or texture dominates the image, p = 0.7 is suitable when prioritizing the image quality, but p = 0.1 is more appropriate when computational cost is important. However, in images dominated by edges, p = 0.1 is appropriate regardless of whether image quality or computational cost is important. In this paper, where the objective is the compression of high-resolution images, we adopt α = 0.8, which provides the values of the image quality. Therefore, in images dominated by gradation or texture, we set p = 0.7 and in images dominated by edge, we set p = 0.1.
3 Effectiveness of Determined Parameter p In this section, to confirm the effectiveness of the parameter p determined in the previous section, we compare the computational cost and subjective image quality. Table 3, in relation to the test image shown in Fig. 1, shows the respective selected transform ratios when applying DHCT ( p-DHCT) using the parameter p and DHCT (1-DHCT) using 1-norm. However, DCT, D-H, H-D, and NHT are four types of transform used in DHCT, DCT is two-dimensional DCT, D-H is a transform that applies HT after applying DCT, H-D is a transformation that applies DCT after applying HT, NHT is nonstandard Haar transform. Furthermore, for comparing the computational cost required for reconstruction with 1-DHCT, the reduction rate is shown as the computational cost reduction rate. From Table 3, with all of the images except (c) Field, we can see that p-DHCT had a lower DCT selection rate and that the computational cost was lower. HT, contrary to DCT, had low computational cost, and as the multiplier is not required, it is far superior on the computational side [6].
An AHP Approach for Transform Selection in DHCT-Based Image Compression
515
Table 3 Selection ratio of each transform in p-DHCT and reduction ratio of computational amount from previous research. The number rounds off the decimal’s third place Selection ratio of each transform (%) p-DHCT 1-DHCT R(%) DCT D-H H-D NHT DCT D-H H-D NHT (a) Woman (b) Flower (c) Field (d) Pier (e) Thread (f) Silver
64.97 51.19 99.79 34.29 82.47 16.36
0.92 1.18 0.01 0.77 1.53 0.02
0.77 1.12 0.07 0.59 1.60 0.11
33.34 46.51 0.14 64.36 14.40 83.51
83.34 62.15 99.75 70.11 88.04 77.64
1.55 2.16 0.04 8.45 1.72 7.93
1.21 2.02 0.07 9.37 1.56 6.35
13.89 33.67 0.14 12.08 8.68 8.07
16.79 12.97 −0.01 41.64 4.79 60.84
R: Computational cost reduction rate (%)
Fig. 5 Comparison of noise between p-DHCT and 1-DHCT
Hence, we can see that p-DHCT is superior in terms of computational cost. However, owing to the fact that DCT is reduced, its impact on image quality is also important. To compare the image qualities of the 1-DHCT-reconstructed images, the images were compressed until p-DHCT and 1-DHCT were 0.2 Bpp and then reconstructed. Here, Bpp [Bit/pixel] is the unit of bit rate required for reconstruction. Figure 5 compares the enlarged images in noisy sections of “Woman,” “Pier,” and “Thread” extracted from the p-DHCT and 1-DHCT-reconstructed images. Because the effect of the DCT selection ratio is decreased, the mosquito noise is visually reduced in image (b) of Fig. 5. However, the negative impact of increasing the HT is absent
516
M. Morita et al.
in images (a) and (c). Additionally, in image (b), jagged noise appears in the white antenna section, but the edge is sharper in the p-DHCT reconstruction than in the 1-DHCT reconstruction.
4 Conclusions We focused on an image compression method based on the transform selection using p-norm, and propose a method using AHP to determine the parameter that corresponds to the features of the input image in advance. In the proposed method, we used three types of noise-mosquito noise, jagged noise, and block noise as evaluation criteria and created pairwise comparison matrices related to noise for each image feature. Furthermore, we, respectively, evaluated the negative impact of the noise occurring in the reconstructed images for each parameter and each of the three types of noise. We also considered the computational cost for not only noise but all parameters and determined appropriate parameters to obtain a comprehensive level of importance. In the numerical experiment, we compared cases when using the parameter predetermined using this method and the previous method in which the parameter was not deployed and confirmed that the computational cost required for reconstruction was significantly reduced. Moreover, it was subjectively confirmed that it also had the effect of reducing noticeable noise from a reconstructed image quality perspective.
References 1. Morita, M., Ashizawa, K., Yamatani, K.: Computational Complexity Reduction for DCT- and HT-Based Combinational Transform and Its Application to High-resolution Image Compression. Urban Science Studies, vol. 19, pp. 81–92 (2014) (Japanese) 2. Ashizawa, K., Yamatani, K.: A New Frequency Transform Method Using Orthogonal Combination with the DCT and HT: Its Application to Mosquito Noise Reduction in Image Compression. The Institute of Electronics, Information and Communication Engineers (A), vol. 7, pp. 484–492 (2013) (Japanese) 3. Kinoshita, E., Miyasaka, F., Ishikawa, Y., Azuma, Y.: Renewal cost benefit analysis using extended AHP method. Oper. Res. Soc. Jpn. 40(8), 411–416 (1995) (Japanese) 4. Moriya, T., Tatsuhara, S., Nakajima, T., Tanaka, T., Tsuyuki, S.: A method to determine thinning priorities using a geographic information system and analytic hierarchy process. Jpn. Soc. For. Plann. 46(2), 57–66 (2013) (Japanese) 5. Institute of Image Electronics Engineers of Japan, high-resolution color digital standard images (XYZ/SCID), JIS X 9204 (2004). http://wwwzoc.nii.ac.jp/iieej/trans/scidnew.html (Japanese) 6. Ogawa, J., Ashizawa, K., Yamatani, K.: Haar Transform-Based Lossy Image Compression Method Using Difference Components Prediction. Urban Science Studies, vol. 14, pp. 71–79 (2009) (Japanese)
SPCM with Improved Two-Stage Method for MDAHP Including Hierarchical Criteria Takao Ohya
Abstract We have proposed a super pairwise comparison matrix (SPCM) to express all pairwise comparisons in the evaluation process of the dominant analytic hierarchy process (D-AHP) or the multiple dominant AHP (MDAHP) as a single pairwise comparison matrix and have shown calculations of super pairwise comparison matrix in MDAHP with hierarchical criteria. This paper shows the calculations of SPCM with an improved two-stage method for the multiple dominant AHP including hierarchical criteria.
1 Introduction AHP (Analytic Hierarchy Process) proposed by Saaty [1] enables objective decision making by a top-down evaluation based on an overall aim. In actual decision-making, a decision-maker often has a specific alternative (regulating alternative) in mind and makes an evaluation on the basis of the alternative. This was modeled in D-AHP (the dominant AHP), proposed by Kinoshita and Nakanishi [2]. If there are more than one regulating alternatives and the importance of each criterion is inconsistent, the overall evaluation value may differ for each regulating alternative. As a method of integrating the importance in such cases, CCM (the concurrent convergence method) was proposed. Kinoshita and Sekitani [3] showed the convergence of CCM. Ohya and Kinoshita [4] proposed an SPCM (Super Pairwise Comparison Matrix) to express all pairwise comparisons in the evaluation process of the dominant analytic hierarchy process (AHP) or the multiple dominant AHP (MDAHP) as a single pairwise comparison matrix. Ohya and Kinoshita [5] showed, by means of a numerical counterexample, that in MDAHP an evaluation value resulting from the application of the logarithmic least T. Ohya (B) School of Science and Engineering, Kokushikan University, Tokyo, Japan e-mail: [email protected] © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_45
517
518
T. Ohya
squares method (LLSM) to an SPCM does not necessarily coincide with that of the evaluation value resulting from the application of the geometric mean multiple DAHP (GMMDAHP) to the evaluation value obtained from each pairwise comparison matrix by using the geometric mean method. Ohya and Kinoshita [6] showed, using the error models, that in D-AHP an evaluation value resulting from the application of the logarithmic least squares method (LLSM) to an SPCM necessarily coincide with that of the evaluation value resulting obtained by using the geometric mean method to each pairwise comparison matrix. Ohya and Kinoshita [7] showed the treatment of hierarchical criteria in D-AHP with a super pairwise comparison matrix. Ohya and Kinoshita [8] showed the example of using SPCM with the application of LLSM for the calculation of MDAHP. Ohya and Kinoshita [9, 10] showed that the evaluation value resulting from the application of LLSM to an SPCM agrees with the evaluation value determined by the application of D-AHP and MDAHP to the evaluation value obtained from each pair-wise comparison matrix by using the geometric mean. SPCM of D-AHP or MDAHP is an incomplete pairwise comparison matrix. Therefore, the LLSM based on an error model or an eigenvalue method such as the Harker method [11] or two-stage method [12] is applicable to the calculation of evaluation values from an SPCM. Ohya [13] shows calculations of SPCM by Harker method for the multiple dominant AHP including Hierarchical Criteria. Nishizawa proposed Improved Two-Stage Method (ITSM). This paper shows the calculations of SPCM by ITSM for the multiple dominant AHP including Hierarchical Criteria.
2 SPCM The true absolute importance of alternative a(a = 1, . . . , A) at criterion c(c = 1, . . . , C) is vca . The final purpose of the AHP is toobtain the relative value between alternatives of the overall evaluation value va = Cc=1 vca of alternative a. The relative comparison values rcca a of the importance vca of alternative a at criteria c as compared with the importance vc a of alternative a in criterion c , are arranged a (CA × CA) or (AC × AC) matrix. This is proposed as the SPCM in R = rcca a or raac c . In a (CA changes first. In a (CA × CA) matrix, alternative × CA) matrix,index of SPCM’s A(c − 1) + a, A c − 1 + a th element is rcca a . In a (AC changes first. In a (AC × AC) matrix, × AC) matrix, index of criteria SPCM’s C(a − 1) + c, C a − 1 + c th element is raac c . In an SPCM, symmetric components have a reciprocal relationship as in pairwise comparison matrices. Diagonal elements are 1 and the following relationships are true: c a exists and If rcca a exists, then rca
SPCM with Improved Two-Stage Method for MDAHP …
519
ca rca = 1/rcca a
(1)
ca rca =1
(2)
SPCM of D-AHP or MDAHP is an incomplete pairwise comparison matrix. Therefore, the LLSM based on an error model or an eigenvalue method such as the Harker method [10] or two-stage method is applicable to the calculation of evaluation values from an SPCM.
3 Improved Two-Stage Method The i j element of comparison matrix A is denoted by ai j for i, j = 1 . . . n. Nishizawa [12] proposed the following estimation method ITSM. For unknown ai j : ai j =
n
1/m aik ak j
,
(3)
k=1
where m is the number of known aik ak j , i = 1 . . . n. If unknown comparisons in factors of aik ak j are included, then assume aik ak j = 1. If m = 0, the estimated ai j by (3) is treated as known comparison, and the ai j with m = 0 in (3) is treated as unknown in the next level. The above procedure is repeated until unknown elements are completely estimated.
4 Numerical Example of Using SPCM for Calculation of MDAHP Let us take as an example the hierarchy shown in Fig. 1. Three alternatives from 1 to 3 and seven criteria from I to VI, and S are assumed, where Alternative 1 and Fig. 1 The hieratical structure
520
T. Ohya
Alternative 2 are the regulating alternatives. Criteria IV–VI are grouped as Criterion S, where Criterion IV and Criterion V are the regulating criteria. As a result of pairwise comparisons between alternatives at criterion c(c = I, . . . , VI), the following pairwise comparison matrices RcA , c = I, . . . , V I are obtained: ⎛ 1 ⎞ ⎛ ⎛ 1 1⎞ ⎞ 1 3 5 173 1 3 3 A = ⎝ 3 1 13 ⎠, RIA = ⎝ 3 1 3 ⎠, RIIA = ⎝ 17 1 13 ⎠, RIII 1 1 1 1 31 331 5 3 3 ⎛ ⎛ 1 ⎞ ⎛ 1 ⎞ ⎞ 135 1 3 3 1 5 3 A A A 1 ⎝ ⎝ ⎝ ⎠ ⎠ RIV = 3 1 1 , RV = 3 1 5 , RVI = 5 1 7 ⎠, 1 1 1 1 1 11 1 1 5 3 5 3 7 With regulating Alternative 1 and Alternative 2 as the representative alternatives, and Criterion IV and Criterion V as the representative criteria, the importance between criteria was evaluated by pairwise comparison. As a result, the following pairwise comparison matrices R1C , R1S , R2C , R2S are obtained: ⎡
1 ⎢3 ⎢ ⎢ R1C = ⎢ 31 ⎢ ⎣3 5 ⎡ 1 ⎢1 ⎢5 ⎢ R2C = ⎢ 1 ⎢1 ⎣3 9
⎤ 3 13 51 ⎡ 1 ⎤ 1 5 1 21 ⎥ 1 2 2 ⎥ S 1 1⎥ 1 ⎣ , R = 1 5 9⎥ 1 2 1 5 ⎦, 5 1⎥ 1 1 151 2⎦ 1 2 5 2921 ⎤ 5 1 3 19 ⎡ 1 1⎤ 1 13 1 19 ⎥ 1 9 4 ⎥ ⎥ 3 1 1 19 ⎥, R2S = ⎣ 9 1 6 ⎦, ⎥ 1 1 1 19 ⎦ 4 16 1 9991 1 3
The (CA × CA) order SPCM for this example is
SPCM with Improved Two-Stage Method for MDAHP …
521
The complete SPCM T by ITSM for this example is
Table 1 shows the evaluation values obtained from the SPCM by ITSM for this example.
522
T. Ohya
Table 1 Evaluation values obtained by SPCM + ITSM Criterion
I
II
III
IV
VI
Overall evaluation value
Alternative 1
1
2.946
0.495
2.681
V 5.048
1.075
13.245
Alternative 2
1.800
0.564
1.152
0.731
10.110
2.266
16.624
Alternative 3
0.359
1.283
2.388
0.618
1.762
0.520
6.930
5 Conclusion SPCM of MDAHP is an incomplete pairwise comparison matrix. Therefore, the LLSM based on an error model or an eigenvalue method such as the Harker method or two-stage method is applicable to the calculation of evaluation values from an SPCM. In this paper, we showed the way of using SPCM with the application of ITSM for the calculation of MDAHP with hierarchical criteria.
References 1. Saaty, T.L.: The Analytic Hierarchy Process. McGraw-Hill, New York (1980) 2. Kinoshita, E., Nakanishi, M.: Proposal of new AHP model in light of dominative relationship among alternatives. J. Oper. Res. Soc. Jpn. 42, 180–198 (1999) 3. Kinoshita, E., Sekitani, K., Shi, J.: Mathematical properties of dominant AHP and concurrent convergence method. J. Oper. Res. Soc. Jpn. 45, 198–213 (2002) 4. Ohya, T., Kinoshita, E.: Proposal of super pairwise comparison matrix. In: Watada, J., et al. (eds.) Intelligent Decision Technologies, pp. 247–254. Springer, Berlin Heidelberg (2011) 5. Ohya, T., Kinoshita, E.: Super pairwise comparison matrix in the multiple dominant AHP. In: Watada, J., et al. (eds.) Intelligent Decision Technologies. Smart Innovation, Systems and Technologies 15, vol. 1, pp. 319–327. Springer, Berlin Heidelberg (2012) 6. Ohya, T., Kinoshita, E.: Super pairwise comparison matrix with the logarithmic least squares method. In: Neves-Silva, R. et al. (eds.) Intelligent Decision Technologies. Frontiers in Artificial Intelligence and Applications, vol. 255, pp. 390–398. IOS Press (2013) 7. Ohya, T., Kinoshita, E.: The treatment of hierarchical criteria in dominant AHP with super pairwise comparison matrix. In: Neves-Silva, R. et al. (eds.) Smart Digital Futures, vol. 2014, pp. 142–148. IOS Press (2014) 8. Ohya, T., Kinoshita, E.: Using super pairwise comparison matrix for calculation of the multiple dominant AHP. In: Neves-Silva, R., et al. (eds.) Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 39, pp. 493–499. Springer, Heidelberg (2015) 9. Ohya, T., Kinoshita, E.: Super pairwise comparison matrix in the dominant AHP. In: Czarnowski, I., et al. (eds.) Intelligent Decision Technologies 2016. Smart Innovation, Systems and Technologies 57, pp. 407–414. Springer, Heidelberg (2016) 10. Ohya, T., Kinoshita, E.: Super pairwise comparison matrix in the multiple dominant AHP with hierarchical criteria. In: Czarnowski, I. et al. (eds.) KES-IDT 2018, SIST 97, pp. 166–172. Springer International Publishing AG (2019)
SPCM with Improved Two-Stage Method for MDAHP …
523
11. Harker, P.T.: Incomplete pairwise comparisons in the analytic hierarchy process. Math. Model. 9, 837–848 (1987) 12. Nishizawa, K.: Estimation of unknown comparisons in incomplete AHP and it’s compensation. Report of the Research Institute of Industrial Technology, Nihon University Number 77, pp. 10 (2004) 13. Ohya, T.: SPCM with Harker method for MDAHP including hierarchical criteria. In: Czarnowski, I. et al. (eds.) Intelligent Decision Technologies 2019, SIST 143, pp. 277–283. Springer International Publishing AG (2019)
Tournament Method Using a Tree Structure to Resolve Budget Conflicts Natsumi Oyamaguchi, Hiroyuki Tajima, and Isamu Okada
Abstract Many local governments in Japan lack a decision-making protocol for resolving budget conflicts. They often use a method in which the budget allocation essentially follows that of the previous allocation. However, this method is not reliable for adapting to present situations and also results in sectarianism. The governments have been looking for alternatives as no dominant method currently exists. We propose a method for budget allocation that uses a tree structure. This method considers the trade-off between costs and efficiency. The number of assessments required for determining a budget allocation is at least the number of objects for allocation; thus, the method minimizes costs. Furthermore, each section manager is directly responsible for the budget ratios of all of the projects in their own section, so this procedure may alleviate the dissatisfaction of stakeholders. Moreover, this method avoids factitive assessments by prohibiting the choice of a representative project. Our method follows a tournament style which will be expanded on in a future work.
1 Introduction Many local governments in Japan have no appropriate decision protocol to resolve budget conflicts. Two kinds of budget allocation systems are generally used in Japan. One is a system in which budget allocators examine all projects. It can lead to quite N. Oyamaguchi (B) · H. Tajima Shumei University, Chiba, Japan e-mail: [email protected] H. Tajima e-mail: [email protected] I. Okada Soka University, Tokyo, Japan e-mail: [email protected] Vienna University of Economics, Wien, Austria © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_46
525
526
N. Oyamaguchi et al.
appropriate allocations. The other is a system in which a budget allocator provides budget ceilings to each section or for each policy. Although the first system has been in use for a long time, budget allocators are likely to avoid the system due to the huge costs needed. This is why a Japanese nationwide scale questionnaire survey [1] found that some local governments have switched to the second system. However, there is still room for further improvement with this second system [2]. In fact, we find that a certain city abolished the second system after adopting it for 10 fiscal years [3]. Little is known about appropriate methods for optimal budget allocation thus far. This is because such methods should solve lots of issues. For instance, the number of stakeholders who are operating from different standpoints is not small. Hence, the meaning of “optimal” differs between each stakeholder, so it is difficult for them to reach a good consensus. Not only that, there exist some practical restrictions such as limited resources or insufficient amounts of time. Therefore, developing appropriate budget allocation methods is still an issue. In addition, examining performance measurements for budgetary decision-making is also important for developing effective budget allocation methods [4]. Here we propose a method on budget allocation using a tree structure. This method considers the trade-off between costs and efficiency. The number of assessments required for determining a budget allocation is at least the number of objects for allocation; thus, the method minimizes costs. Furthermore, each section manager is directly responsible for the budget ratios of all of the projects in their own section, so this procedure may alleviate the dissatisfaction of stakeholders. Moreover, this method avoids factitive assessments by prohibiting the choice of a representative project.
2 Our Method In this section, we define the new budget allocation method called tournament-style allocation method. Let us consider that an organization is considering distributing its budget to projects where the number of projects is set to n. Each project belongs to a department. A department has at least one project. A manager of a department is called a director. The top of the organization is called an executive. The executive manages all departments either directly or indirectly. If the executive manages them indirectly, there are mid-level departments managed by mid-level executives managing either departments or mid-level departments. An executive, mid-level executives, and directors are collectively called evaluators. This organizational structure is drawn as a tree structure using graph theory. Make a multi-branch tree structure T(n,m) , where n is the number of leaf nodes, and m is that of internal nodes. P denotes a set of leaf nodes {Pi } (i ∈ {1, . . . , n}) that represent projects. E denotes a set of non-leaf nodes {E j } ( j ∈ {0, . . . , m}) that
Tournament Method Using a Tree Structure to Resolve Budget Conflicts
527
represent evaluators, where E 0 is the root node, and the others are internal nodes. Note that E 0 is the executive. The example tree of Fig. 1 is defined as T(10,8) with 10 projects and 9 evaluators. For any j ∈ {0, . . . , m}, we denote by Tj a tree consisting of E j and all of its descendent nodes. Note that T0 = T(n,m) . Let E j be the set of all child nodes of E j and c( j) the set of the index numbers of evaluators in E j . A tree Tk (k ∈ c( j)) is called a child tree of E j . Here, we call Sj the set of child trees of E j . For example, S2 = {T5 , T6 } is obtained from E 2 in the example tree T(10,8) of Fig. 1. Next, we define a map l : {1, . . . , n} × {0, . . . , m} −→ Z ∪ {∞} as l(i, j) = h, where h means the length of the path from E j to Pi . In contrast, we define a map e : {1, . . . , n} × Z −→ {0, . . . , m} as e(i, h) = j, which outputs an index number of E j , where h means the length of the path from E j to Pi . Note that if Pi is not a descendant leaf node of E j , we have l(i, j) = ∞, and e(i, h) is not defined. For any i ∈ {1, . . . , n}, we obtain only one shortest path from E 0 to Pi and call it path i. For example, we consider P7 in Fig. 1. The parent node of P7 is E 7 , and the parent node of E 7 is E 6 . Repeating this procedure, we finally reach the root node, E 0 . Note that every path i for i ∈ {1, . . . , n} is a sequence from E 0 to Pi . We then solve path 7 as follows. path 7
E0
E2
E6
E7
P7
We restrict our method to a specific case, called the tournament model. As is well known, a tournament, in which winners go on to the next round and losers are eliminated in successive rounds, is usually used in many sports. In this model, any evaluator except for the directors must satisfy a tournament condition where an evaluator selects projects that were selected by the evaluator’s child evaluators. In the example of Fig. 1, a set of projects for E 1 is P1 and P3 , and both P1 being selected by E 3 and P3 being selected by E 4 is satisfied. Similarly, E 6 , E 2 , and E 0 satisfies the condition. Therefore, the example tree satisfies the tournament condition. - Step 1 Selection First, all evaluators select projects that they must assess. However, all evaluators are prohibited from assessing them before all assessments are done by those who are children. In the example of Fig. 1, E 1 must wait until assessment by E 3 and E 4 is finished, and E 0 must wait until all assessments by E 1 to E 8 are finished. If an evaluator is a director (who directly connects with leaf nodes), the director selects all
528
N. Oyamaguchi et al.
projects in one’s own department. If an evaluator is not a director (that is, either an executive or a mid-level executive), the one selects a project from among the projects that were selected by the one’s child evaluators. Note that E e(i,h−1) is the only one evaluator included both in E j and path i for E j ( j ∈ {0, . . . , m}) in path i. Let P j by the set of projects whom E e(i,h−1) chooses as the representative projects for each Tk (k ∈ c(e(i, h − 1)). It is easy to see that E j should choose a representative project from P j . Define a map p : {1, . . . , n} × Z −→ {1, . . . , n} that outputs an index number of Pp(i,l(i, j)) ∈ P j , where Pp(i,l(i, j)) is chosen by E j as a representative project of Te(i,h−1) . We label the edge that connects E j to E e(i,h−1) “Pp(i,l(i, j)) ” as follows.
path i
E0
···
Ej
Pp(i,l(i,j))
Ee(i,h−1)
···
Pi
If the evaluators are directors, that is l(i, j) = 1, the child of E j is just Pi . Hence, E j should choose the only child Pi as a representative project. - Step 2 Evaluation Second, all evaluators assess the projects that they must assess. They give a positive real number to each project as an assessment value. For capturing these values, we define a map v : {0, . . . , m} × {1, . . . , n} −→ Z, where v( j, k) means the evaluation value that E j gives Pk .
Fig. 1 Example tree T(10,8)
Tournament Method Using a Tree Structure to Resolve Budget Conflicts
529
Note that an assessment is given relatively. Let us consider that evaluator E 3 assesses two projects, P1 and P2 , as shown in Fig. 1. The assessment values of the two are denoted as v(3, 1) and v(3, 2), respectively. However, these values are relative to each other. That is, the case of E 3 giving v(3, 1) = 2 and v(3, 2) = 5 is completely equivalent to the case of E 3 giving v(3, 1) = 4 and v(3, 2) = 10. - Step 3 Calculation In the final step, the final allocation of budgets is calculated. To do so, we first calculate an n-dimensional vector W (T(i, j) ) = w(P1 ), w(P2 ), . . . , w(Pn ) , where w(Pi ) =
w (Pi ) w (Pi ) i∈{1,...,n}
and
w (Pi ) = v 0, p(i, l(i, 0))
l(i,0)−1 h=1
v e(i, h), p(i, h) v e(i, h), p(i, h + 1) .
Note that i∈{1,...,n} w(Pi ) = 1. Each element of W must correspond to each evaluator’s assessment. That is, for any evaluator E j and for any two projects Pi and Pi , where E j assesses both Pi and Pi , v( j, i) : v( j, i ) = w(Pi ) : w(Pi ). Using this vector, the executive distributes G × w(Pi ) to Pi , where G is the gross of the budget.
3 A Numerical Example In this section, we show an example for easily understanding our method as defined in Sect. 2. We consider an organization whose hierarchy is shown in Fig. 1. In the first step, every evaluator selects projects that one must assess. All labels in Fig. 1 are projects each evaluator selects. In the second step, the relative assessment values are determined. In the example, the following values are determined: v(0, 1), v(0, 5), v(1, 1), v(1, 3), v(2, 5), v(2, 8), v(3, 1), v(3, 2), v(4, 3), v(4, 4), v(5, 5), v(6, 6), v(6, 8), v(7, 6), v(7, 7), v(8, 8), v(8, 9), v(8, 10). Note that these values are relative for each evaluator, and thus, v(5, 5) is meaningless because E 5 has just one project to assess.
530
N. Oyamaguchi et al.
In the case of i = 4, w (P4 ) is calculated as v e(4, h), p(4, h) w (P4 ) = v 0, p(4, l(4, 0)) h=1 v e(4, h), p(4, h + 1) 2 v e(4, h), p(4, h) = v 0, p(4, 3) h=1 v e(4, h), p(4, h + 1) v e(4, 1), p(4, 1) v e(4, 2), p(4, 2) = v 0, p(4, 3) v e(4, 1), p(4, 2) v e(4, 2), p(4, 3)
= v(0, 1)
l(4,0)−1
v(1, 3) v(4, 4) . v(1, 1) v(4, 3)
This is because we have l(4, 0) = 3, p(4, 1) = 4, p(4, 2) = 3, p(4, 3) = 1, e(4, 1) = 4 and e(4, 2) = 1.
4 Evaluation Cost Every evaluator determines the relative weights of projects whose number corresponds to one’s child nodes. This is equivalent to estimating the relative weights of the other projects when a weight of a project is fixed to one. Therefore, the degree of freedom for each evaluator to assess all projects corresponds to the number of one’s child nodes minus one. The following theorem shows that our proposed method is a method that can determine the relative weights of all projects uniquely with the lowest number of assessments. Theorem 1 Let N be the number of all projects and E the number of all evaluators. The total of the degrees of freedom for the assessments by all evaluators is N − 1. Proof Let n 1 , n 2 , . . . , n E be the numbers of projects assessed by each evaluator. E n i = B, where B represents the gross number of branches. An evaluator, Then, i=1 i, decreases the degree of freedom by n i − 1, and thus, E
(n i − 1) = B − E.
i=1
The following corollary shows that this value is equal to N − 1. Corollary 1 For any tree structure, let the gross number of leaves be N , the gross number of nodes except for leaves be E, and the gross number of branches be B, respectively. B + 1 = E + N is satisfied.
Tournament Method Using a Tree Structure to Resolve Budget Conflicts
531
Proof This corollary is proven recursively. First, we consider a graph consisting of a root node only. Then, child nodes are added to the graph sequentially, and finally we assume that the graph corresponds to any tree structure. The first graph is a leaf only, and thus, the equation above is satisfied. Next, we choose a leaf node that has a child node in the final tree structure, and that node and its child nodes and branches that connect to it are added to the graph. Note that any tree structure can be reproduced by repeating this procedure. Let the number of that node’s child nodes be k. By doing this procedure, the number of leaves increases by k, while the target node does not become a leaf, and thus, the increasing number of the right-hand side of the equation is k. By doing this procedure, the increasing number of the left-hand side of the equation is k because the number of branches increases by k. This is why the equation is always satisfied regardless of the procedure.
5 Discussion A salient feature of our method is that a budget allocator or stakeholder is never allowed to choose a representative project from each department. To be more precise, if a manager selects an element that he or she feels to be the best, dishonest situations may emerge. However, with our method, appropriate internal assessment can be expected for the sake of group profits as a whole because any representative project chosen at random is a threshold for determining the budget of all other elements in one’s group. One merit of our method is that it eliminates feelings of unfairness felt by those people in charge of evaluation because the process is a consensus-based one that considers the evaluations of all people in charge, and, additionally, overall costs can also be reduced. All projects can be evaluated through the consensus of all people concerned in each section. In addition, the representative projects selected from each section can be approved by general managers. Although this method is for budget allocation for either medium-scale or smallscale organizations, it is possible to substitute an organization, a section, a policy, or a person for a project. Therefore, the range of applications for this method is wide and includes budget allocation at large-scale organizations, labor evaluation, and the assessment of employees. For further studies, we should obtain a generalized model. In this model, each evaluator is allowed to choose representative projects from among all descendent projects. It is not necessary to choose projects from among the projects already chosen by the child evaluators as in the tournament model. It should be noted that there is no guarantee that a relative weight of a project assessed by any evaluator is correctly measured. This point suggests that we must systematically consider errors in estimation if this is adopted for practical use. Therefore, the other extension is to consider the Analytic Hierarchy Process [5], which is a traditional method of quantifying qualitative data by using individual preferences, a ternary diagram [6], or a ternary graph [7, 8]. It may be effective to minimize errors.
532
N. Oyamaguchi et al.
References 1. Noguchi, Y., Niimura, Y., Takeshita, M., Kanamori, T., Takahashi, T.: Analysis of decision making in local finances. Keizai Bunseki, Econ. Soc. Res. Inst. 71, 1–190 (1978). (in Japanese) 2. Miyazaki, M.: Analysis of Prefectural Budget Process. The Jichi-Soken Monthly Review of Local Government, The Japan Research Institute for Local Government 41(9), 52–78 (2015). (in Japanese) 3. Tachikawa City Home Page. https://www.city.tachikawa.lg.jp/gyoseikeiei/shise/yosan/ gyozaisemondai/documents/h290321toushin.pdf (in Japanese) 4. Melkers, J., Willoughby, K.: Models of performance-measurement use in local governments: understanding budgeting, communication, and lasting effects. Public Adm. Rev. 65(2), 180–190 (2005). https://doi.org/10.1111/j.1540-6210.2005.00443.x 5. Saaty, T.L.: The Analytic Hierarchy Process. McGraw-Hill, New York (1980) 6. Mizuno, T., Taji, K.: A ternary diagram and pairwise comparisons in AHP. J. Jpn. Symp. Anal. Hierarchy Process 4, 59–68 (2015) 7. Oyamaguchi, N., Tajima, H., Okada, I.: A questionnaire method of class evaluations using AHP with a ternary graph. In: Smart Innovation, Systems and Technologies, vol. 97, pp. 173–180. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92028-3_18 8. Oyamaguchi, N., Tajima, H., Okada, I.: Visualization of criteria priorities using a ternary diagram. In: Smart Innovation, Systems and Technologies, vol. 143, pp. 241–248. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-8303-8_21
Upgrading the Megacity Piloting a Co-design and Decision Support Environment for Urban Development in India Jörg Rainer Noennig, David Hick, Konstantin Doll, Torsten Holmer, Sebastian Wiesenhütter, Chiranjay Shah, Palak Mahanot, and Chhavi Arya Abstract Decision-making problems in urban planning and management emerge through escalating urban dynamics, especially of megacities. Instead of bureaucratic top-down methods, digital bottom-up technologies and methodologies provide interesting alternative approaches. The broad integration of crowdsourced information into planning procedures has the potential to fundamentally change planning policies. This becomes especially relevant in the Indian context. India’s massive urban challenges, e.g., environmental pollution, waste management, or informal settlement growth can only be addressed with the active involvement and support of local communities. This article reports on the pilot application of a digital co-design environment in the Indian city of Pimpri Chinchwad in order to target improper garbage disposal and to support public space upgrading and creative placemaking. Based on tools and methods developed within the projects U_CODE (EU Horizon2020) and Pulse (India Smart Cities Fellowship), the novel co-design environment a) facilitated citizen participation online as well as on local level and b) supplied to municipal decision-makers new design and planning intelligence harvested from local communities. Specifically, Pimpri Chinchwad’s administration wanted to transform a vacant area used as a dumpsite into a quality community space, whose specific function and design, however, would be determined by the residents themselves. For this challenge, Team Pulse and U_CODE configured an interactive co-design environment based on touch-table technology that traveled with a Co-Creation Truck to local communities. In result, the digital participation system enabled some hundreds of people to actively communicate demands and grievances about the location at stake and to co-create design proposals which were transmitted to local decision-makers for scrutinization and implementation. Due to overall positive results and significant social and political impact of the pilot test, a broader implementation in Pimpri Chinchwad is envisioned in the near future. J. R. Noennig (B) · D. Hick · K. Doll · T. Holmer · S. Wiesenhütter Technische Universitaet Dresden, Dresden 01069, Germany e-mail: [email protected] C. Shah · P. Mahanot · C. Arya Ministry of Housing and Urban Affairs, Smart City Fellowship, New Delhi, India © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9_47
533
534
J. R. Noennig et al.
1 Introduction As dynamics of megacities have reached a level beyond conventional planning, mandated planners and administrations face severe difficulties in coping with the fast past of urban growth, informal construction, and subsequent challenges like waste, dilapidation of public spaces, or accessibility problems. Such problematic levels of complexity are hard to resolve with the established means of decision-making in urban development [15]. Instead of top-down planning methods and systems, new approaches are needed, that engage local communities bottom-up in creating planning schemes and making responsible decisions about urban development issues. Recent research in co-creation [1] methodologies provides promising new paths to cope with that problem. New digital interaction technologies enable entirely new perspectives on citizen cocreation/participation [7]. These approaches become especially relevant for the Indian context today, where maximum urban challenges coincide with a high affinity towards digital technology on the side of the population. The paper reports on the pilot application of already established methodologies of the two previously run projects Pulse and U_CODE (see below), which, however, were synergized here for the first time. The specific research questions are: 1. How to adapt the participation system developed within EU-funded/-targeted research project to the Indian context? 2. How to create user acceptance and usability by the specific local social communities in partially informal environments? 3. How valid are the results in terms of planning and decision support for urban development? India faces massive urban challenges, such as the rapid growth of informal settlements, environmental pollution, waste management, or traffic management. These challenges, which affect 400 million urban Indians, can only be solved with the active support and involvement of local communities [15]. Their local knowledge and creativity become an indispensable input for planners and decision-makers [8]. To address the broad specter of problems, the Government of India launched the USD 15 billion Smart Cities Mission to tackle the massive urban challenges and establish a new level of city planning based on the citizen-centric development through the use of innovative digital technologies. The Ministry of Housing and Urban Affairs, which implements several urban missions in the country, started the India Smart Cities Fellowship Programme in January 2019 to involve young researchers and entrepreneurs who want to invent breakthrough ideas to solve the urban problems that the country faces and push those into real-world pilot testing. Team Pulse, the co-author of this paper, is a part of this program. Going beyond narrow definitions of Smart Cities as merely technological solutiondriven approach, the Indian Smart Cities Mission promotes a notion of “Smart City” that primarily includes social innovation, too [9]. To democratically rule the
Upgrading the Megacity Piloting a Co-design and Decision …
535
soon population-rich country in the world, India’s governments for the first time put a strong emphasis on citizen engagement in urban development. Moreover, Indian culture with its delicate community organization and social structure naturally demands for appropriate means of participation. Here, “smart” new forms of human– machine interaction are needed that work at the societal and community levels [6]. At the International Urban Cooperation (IUC) conference in Delhi in July 2019, Indian urban researchers (Pulse project, see below) came in contact with EUresearchers in urban co-creation and co-design (U_CODE project, see below). Seeing the synergetic potential and the complementary fit of both projects, the coordinators decided to join forces for a concerted pilot test in Pimpri Chinchwad in November–December 2019. The European Union, although on entirely different cultural and social backgrounds, puts a strong emphasis on participation and democratic decision-making in the context of urban development too [11]. Multiple occasions have shown how lacking, or wrong, participation can stir up frustration and lead to strong public resistance against political and planning decisions. To address the growing demand for digital tools and methodologies in support of participation and co-creation, the EU funds various R&D activities within its H2020 program, among others. A number of digital solutions have been developed such as Maptionaire (Finland), Play the City (Netherlands), or DIPAS (Germany) which have strong individual focus, e.g., on geodata application, gamification, or online participation [3]. Generally, missing, however, is an approach that goes beyond language-based co-creation and feedback solutions, and puts a stronger emphasis on collaborative design work and creative decision-making [16].
2 Related Work India has launched Smart Cities Mission, a large program to support the digital Upgrading of 100 larger cities. In this context, numerous research and innovation projects were initiated. One of the India Smart Cities Fellowship projects, Pulse (2019–2020), aims to enable citizen participation in urban development with a system of digital and non-digital components including a new online questionnaire app for municipalities (Fig. 1). For implementation and piloting, Team Pulse had joined forces with the municipal cooperation of Pimpri Chinchwad, a city of approx. 2.2 million people near Pune and Mumbai, and a leading industrial center. The city has been selected as one of the 99 cities within India’s Smart Cities Mission. Funded by the EU within its Horizon 2020 program and coordinated by TU Dresden’s Knowledge Architecture Lab, U_CODE (“Urban Collective Design Environment,” 2016– 2019) has developed novel approaches for citizen participation in urban development projects [5]. In U_CODE, a digital toolbox and methodology for co-creative design work on urban projects were established by an international consortium of universities and enterprises such as TU Dresden, TU Delft, ISEN Toulon, Oracle, or Ansys (Fig. 2)
536
J. R. Noennig et al.
Fig. 1 Pulse project: Spatial analysis (~300,000 entries from the City’s Grievance Redressal Platform were standardized and mapped)
Fig. 2 U_CODE toolbox and Minimal Viable Process for good citizen participation
Targeting at large crowds as potential collaborators, the U_CODE methodology spans from online harvesting of relevant context knowledge, across online crowdsourcing of creative design ideas, to local co-design activities supported by digital interaction tools [10]. A minimal viable process was established that ensures good participation under any circumstances, disabling fake participation campaigns or biased activities. The U_CODE methodology is supplemented by a digital toolbox which comprises design and analysis tools for each phase of the minimal viable process of urban codesign. Design tools include online apps for Design Playgrounds, VR, and touch-table applications, as well as 3D printing. Analytic tools help to investigate the collected
Upgrading the Megacity Piloting a Co-design and Decision …
537
knowledge and design ideas in regards to public sentiment, discourse structure, and design quality [4]. After a 3.5-year R&D phase, the U_CODE tools and methods are being tested now with several pilot projects. In July 2019, the redesign campaign for a Special School complex in the German city of Sangerhausen has been successfully accomplished [5], while ongoing pilots include public space re-design in Germany city Dresden as well as the campus planning for TU Charkiw (Ukraine). By synthesizing the approaches of Pulse and U_CODE, not only an ambitious international and intercultural urban cooperation was started, but also a new digital workflow was established (combining qualitative and quantitative sociospatial surveys with digital co-design tools) and pilot-tested, with a real user community in a larger Indian city. Whereas digital participation is widely implemented in many contexts already, the joint co-creation of design work with interactive tools is still a novelty. While this applies to the European context, it holds true even more so for the Indian context.
3 Local Pilot Test—Methodology In autumn 2019, Team Pulse and U_CODE agreed to conduct a joint pilot test of both project’s solutions in Pimpri Chinchwad. For the joint team, the overall goal was to demonstrate the validity of the developed technologies and methodologies, and to run a real-world pilot that would improve a concrete community’s situation. Within two months, in cooperation with Pimpri Chinchwad Municipal Corporation (Municipal Commissioner, Health Department, Smart City Department), a specific use case and site were determined, and an overall choreography established from which detailed schedules and technical specifications were derived for the pilot implementation (Fig. 3). This choreography was informed on so-called “User Engagement Protocols” that were developed in the U_CODE project [9, 18]. On the side of Pimpri Chinchwad Municipal Corporation, a central goal was to reduce the tendency of citizens to improperly dispose garbage in vacant spaces, and to stir up community engagement for local upgradation of the site into a usable public
Fig. 3 Choreography of pilot implementation in Pimpri Chinchwad
538
J. R. Noennig et al.
Fig. 4 Project site in Pimpri Chinchwad: Arial view (left), before/after cleaning (right)
space, as well as deriving one citizen-generated design proposal. To run the local test, a 10-year-old dumpsite was selected in the Ganganagar area of Pimpri Chinchwad, which for the purpose of the project was cleaned from garbage—thus presenting a “clean slate” for new design ideas (Fig. 4). Citizens were asked to scheme for future usage and design of the place. On the basis of a preliminary investigation conducted by Team Pulse over the previous 6 months, residents’ demands and grievances were collected. Thereupon, as the most striking feature of the pilot test, a local digital co-design lab was established in a refurbished bus (Fig. 5) in order to “roll” the participatory activities to different neighborhoods. This idea was backed by research on mobile creativity labs and studies in urban and regional innovation management [14]. Upon arrival, local citizens were invited to the rolling lab, where an interactive touch-table (“ShapeTable,” Fig. 5) functioned as a co-creative work desk for the generation of spatial design ideas. The results were collected by the U_CODE system and eventually displayed to the residents. By way of an expert voting, best entries were selected and transmitted to the local government and decision-makers in order to prepare the actual implementation in the short term. With that procedure, the best ideas from citizens
Fig. 5 Co-Creation truck before and after refurbishment (left); On-site discussion of design proposals at the interactive ShapeTable (right)
Upgrading the Megacity Piloting a Co-design and Decision …
539
were combined with the demands derived from expert knowledge, leading to a design proposal ready for execution that answered local resident’s desires on the one hand, but fulfilled the municipality’s technical and economic constraints on the other hand too.
4 Technology As a central technology component, the Pulse Web Interface helped to collect information and evidence about local citizens’ needs and preferences via smartphones and tablets (Fig. 6). Team Pulse had previously also conducted a rigorous surveillance exercise and Focus Group Discussions with affected stakeholders to discover the root cause of the problem and the nuisance creators. The Pulse Web Interface helped to conduct statistical analysis and the creation of a “Grievance Map,” which functioned as a point of departure for subsequent design activities. Upon these preliminary investigations, a used bus was refurbished to equip the components of the Pulse & U_CODE pilot into a rolling lab or “Design Playground” that could easily reach several local communities (Fig. 5). This approach took reference to TRAILS (Travelling Innovation Labs and Services), another experimental project by the TU Dresden Knowledge Architecture Lab, which is currently bringing innovative technologies and co-creation events to economically declining regions in the German–Polish border region [12, 13]. The hardware centerpiece of this mobile co-design environment was a large interactive touch-table device (“ShapeTable”) that functioned as Local Design Playground. On this user front-end, local residents (children, elderly, educated as well as non-educated) could easily design shapes and functions for the selected site just by using their hands and fingers (Fig. 7). Based upon the U_CODE experiences and technology, a newly developed codesign software was deployed on the ShapeTable, which uses an adjustable thematic library with specific design objects that were developed after several interactions with citizens beforehand (Fig. 8). With them, design actions could be carried out by simple drag-and-drop and swiping moves on the table. Objects could be placed on maps or aerial photographs, manipulated in terms of shape, color, or material. Annotations could be attached to the design proposals in order to briefly explain the
Fig. 6 Pulse Web Interface: Citizen Feedback on area cleanliness
540
J. R. Noennig et al.
Fig. 7 ShapeTable for hands-on design ideas from citizens
Fig. 8 Object libraries for ShapeTable
intentions of the creators. A newly developed program interface allowed for quick export and postprocessing of the ShapeTable results, e.g., for gallery presentation or design analytics.
5 Results Discussion Since the process, as well as outputs of the co-design activities, were digital in nature, a set of KPIs could be easily derived, thus making the campaign measurable in terms of qualitative and quantitative impact (Fig. 9). In sum, within a duration of 8 hours (spread over 2 days), a total number of 110 participants were reached who produced 17 designs with altogether 668 design objects placed. Via the Pulse Web Interface, 58 qualitative feedbacks were received that commented on these design proposals.
Upgrading the Megacity Piloting a Co-design and Decision …
541
Fig. 9 Quantitative results of the pilot implementation in Pimpri Chinchwad
According to investigations on the efficacy of public participation processes, these numbers indicate a successful campaign [3, 17]. In qualitative terms, there has been a good level of participation and community engagement. A positive ratio of outputs per participants as well as a high level of feedback and comments were reached. No frustrations or negative comments were noticed, whereas the general atmosphere was lively and committed; people were apparently happy to have a chance to participate in this new co-creation format. A change of attitude and behavior toward the initially neglected dumpsite was witnessed and a high motivation observed on the side of the residents to transform their own neighborhood. Thus, one key objective of the municipality of Pimpri Chinchwad was reached. The citizens expressed a strong desire toward co-creative engagement; they considered the creation of design proposals as a serious task, despite the limited time of the campaign. This confirmed with research that local communities bear vivid interest in qualifying their immediate living environment [2]. The setup provided for easy and low-threshold commitment across age groups and social strata. No angry disputes or controversies appeared, only small disagreement was expressed when neighborhood “politicians” commenced participation. Eventually, stakeholders who initially were skeptical of the action expressed great interest in taking the project forward after the workshop. Local decision-makers such as the Municipal Commissioner, Heads of Departments, Executive Engineers, and Sanitary Inspector were involved actively from the beginning. All municipal officials positively received the tools, technology, and methodology, and acknowledged the surprising quality of the design results (Fig. 10). In effect, the Municipal Commissioner suggested a broader roll-out and wider application of the system in all neighborhoods of Pimpri Chinchwad. The Pulse web interface was effective in gathering feedback and aspiration information quickly. Questions were modified based on the nature of responses and grievances. People aged 16–35 were particularly savvy with the interface, while others usually required support. It became clear that the U_CODE co-design tools
542
J. R. Noennig et al.
Fig. 10 Top-rated design proposals (axonometric, 3D rendering)
cannot work “out-of-the-box” but need local adaptation, thus implying additional customization efforts (e.g., redeveloping the design playground in terms of object libraries and local language). To conclude the technical adaptation to the specifics of the Pimpri Chinchwad context, an additional development time of 2–3 personmonths is estimated. Positive feedback, however, was collected in regards to the overall usability: the tools were smoothly applicable, ran steady, and enabled easy an entertaining user engagement. As a special new feature, high-quality 3D-rendering was developed for the Pimpri Chinchwad pilot test, which strongly supported the understanding of design solutions. In conclusion, the participation system developed within EU-funded/-targeted research project was successfully adapted to the Indian context. High user acceptance and relatively good usability of the system was reached. From the pilot implementation, valid design results were generated that helped planners to improve and propel ongoing urban upgrading projects and supported local decision-makers in finding appropriate spatial development strategies.
6 Conclusions and Outlook on Future Research The overall societal goal of the Pulse/U_CODE solutions is to enable as many communities as possible to tackle pressing urban challenges on a local level, bringing together bottom-up (residents) and top-down (administration) perspectives. The implementation in Pimpri Chinchwad provided a successful demonstration, paving the way for a broader scale-up in the city in the near future. For this end, the technology will be handed over to local users and administrators as an open-source hardware and software systems, so that all parties can contribute to the future development, and benefit from further technical development. In accordance with the overall societal goal, potential exploitation, and commercialization models, instead of software or hardware licensing, should focus on deeper data analysis and postprocessing of results. As a potential platform solution, the Pulse/U_CODE system can plug in other applications and services. In the near future, the cooperation between U_CODE, Pulse, and Pimpri Chinchwad shall be raised to a more formal level (Memorandum of Understanding). The overall positive results of the pilot test justify expectations to run the system on
Upgrading the Megacity Piloting a Co-design and Decision …
543
a long-term basis in many neighborhoods in Pimpri Chinchwad, thus making urban development and participation a permanent part of project delivery in the city. While the Indian context presents most valid use cases and application scenarios— where high impact can be generated by the tested system—more pilot tests are planned in Ukraine, Germany, and Japan.
References 1. Heijne, K., Meer, J.V., Stelzle, B., Pump, M., Klamert, K., Wilde, A., Siarheyeva, A., Jannack, A.: Survey on Co-design Methodologies in Urban Design (2018) 2. Hick, D., Urban, A., Noennig, J.: A pattern logic for a citizen-generated subjective quality of life index in neighborhoods. In: Conference Proceedings IEEE UKRCON, Lviv (2019) 3. Hofmann, M.; Münster, S.; Noennig, J.R.: A theoretical framework for the evaluation of massive digital participation systems in urban planning. JGDSA J. Geovisualization Spat. Anal. (2019) 4. Holmer, T.; Noennig, J.: Analysing topics and sentiments in citizen debates for informing urban development. In: Proceedings of the International Forum for Knowledge Asset Dynamics IFKAD, Delft (2018) 5. Jannack, A., Holmer, T., Stelzle, B., Doll, K., Naumann, F., Wilde, A., Noennig, J.: Smart campus design-U_CODE tools tested for co-designing the CJD learning campus, Sangerhausen Case Study. In: Hands-on Science. Innovative Education in Science and Technology, pp 11–20 (2019), ISBN 978-989-8798-06-0 (HSci./) 6. Kelber, M., Jannack, A., Noennig, J.: Knowledge-based participation to identify demands of a future city administration: dresden case study. In: Proceedings of the International Forum for Knowledge Asset Dynamics IFKAD, Matera (2019) 7. Münster, S.; Georgi, C.; Heijne, K.; Klamert, K.; Noennig, J.; Pump, M.; Stelzle, B.; van der Meer, H.: How to involve inhabitants in urban design planning by using digital tools? An overview on the state of the art, key challenges and promising approaches. In: 21st International Conference in Knowledge Based and Intelligent Information and Engineering Systems-KES2017. Procedia Computer Science, Elsevier (2017) 8. Noennig, J., Schmiedgen, P.: From noise to knowledge. Smart city responses to disruption. In: Teodorescu, H.-N. (ed.) Disaster Resilience and Mitigation. New Means, Tools, Trends. NATO Science for Peace and Security Series-C: Environmental Security. Springer (2014) 9. Noennig, J., Stelzle, B.: Citizen participation in Smart Cities–towards an user engagement protocol. In: Proceedings of the International Forum for Knowledge Asset Dynamics IFKAD, Matera (2019) 10. Noennig, J.; Jannack, A.; Stelzle, B.; Holmer, T.; Naumann, F.; Wilde, A.: Smart citizens for Smart Cities–a user engagement protocol for citizen participation. In: IMCL2019 Proceedings 13th International Conference on Interactive Mobile and Communication Technologies and Learning (2019) 11. Piskorek, K.; Barski, J.; Noennig, J.: Creative solutions for Smart Cities–the syncity approach. In: Proceedings of the International Forum for Knowledge Asset Dynamics IFKAD, Bari (2015) 12. Saegebrecht, F.; Schmieden, P.; John, C.; Noennig, J.: TRAILS: experiences and insights from a traveling innovation lab experiment. In: Proceedings of the International Forum for Knowledge Asset Dynamics IFKAD, Delft (2018) 13. Saegebrecht, F.; John, C.; Schmiedgen, P.; Noennig, J.: Experiences and outcomes from a traveling innovation lab experiment. In: Measuring Business Excellence (2019). https://doi. org/10.1108/MBE-11-2018-0101 14. Schmiedgen, P.; Noennig, J.; Sägebrecht, F.: TRAILS–traveling innovation labs and services for SMEs and educational institutions in rural regions. In: Proceedings of the International Forum for Knowledge Asset Dynamics IFKAD, Dresden (2016)
544
J. R. Noennig et al.
15. Singh, B., Parmar, M.: Smart City in India. Routledge India, London (2020). https://doi.org/ 10.4324/9780429353604 16. Stelzle, B.; Noennig, J.R.; Jannack, A.: Co-design and co-decision: decision making on collaborative design platforms. In: 21st International Conference in Knowledge Based and Intelligent Information and Engineering Systems-KES2017. Procedia Computer Science, Elsevier (2017) 17. Stelzle, B.; Noennig J.R.: A Method for the assessment of public participation in urban development. In: Urban Development Issues, vol. 61, pp. 33–40 (March 2019) 18. Stelzle, B.; Naumann, F.; Holmer, T.; Noennig, J.; Jannack, A.: A minimal viable process and tools for massive participation in urban development. Int. J. Knowl. Based Dyn. IJKBD (2019)
Author Index
A Andreev, Yuri S., 329 Arsalan, Tsyndymeyev, 391 Arya, Chhavi, 533 Asakawa, Tomoo, 303
B Balonin, Nikolaj, 223 Barbucha, Dariusz, 117 Bertoglio, Nicola, 63 Buryachenko, Vladimir V., 129 Byanjankar, Ajay, 15
C Cao, Qiushi, 37 Cateni, Silvia, 211 Cohen, Yaniv, 141 Colla, Valentina, 211
D Dekel, Ben Zion, 141 Della Ventura, Michele, 255 Doll, Konstantin, 533 Drankou, Pavel, 437 Dusi, Michele, 49
F Fartushniy, Eduard, 413, 429 Favorskaya, Alena, 179, 189, 201 Favorskaya, Margarita N., 129, 167 Fenu, Gianni, 79 Fomina, Irina, 413
Fonał, Krzysztof, 315 Fukui, Keisuke, 279
G Giarelis, Nikolaos, 105 Golubev, Nikita, 403 Golubev, Vasily, 189 Gryazin, Igor, 167
H Hamad, Yousif, 243 Hick, David, 533 Holmer, Torsten, 533
I Iio, Jun, 451 Ishii, Akira, 461, 471 Ito, Mariko I., 27 Iureva, Radda A., 329
J Jedrzejowicz, Piotr, 3
K Kabaev, Evgeny, 155 Kanakaris, Nikos, 105 Karacapilidis, Nikos, 105 Kawahata, Yasuko, 471 Kents, Anzhelika, 243 Khokhlov, Nikolay, 201 Kimura, Yuto, 509
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2020 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 193, https://doi.org/10.1007/978-981-15-5925-9
545
546 Kizielewicz, Bartłomiej, 341, 353, 365 Klimenko, Herman, 413 Kolesnikova, Daria V., 329 Kołodziejczyk, Joanna, 341, 353, 365 Koshechkin, Konstantin, 379, 391, 413 Kozhin, Pavel, 413 Kremlev, Artem S., 329 Krouk, Evgenii, 141 Krylov, Oleg, 429 Kurako, Mikhail, 155, 243
L Lamperti, Gianfranco, 49, 63 Lebedev, Georgy, 379, 391, 403, 413, 429 Li, Sijie, 91
M Mahanot, Palak, 533 Malloci, Francesca Maridina, 79 Matsulev, Alexander, 155 Medeiros, Gabriel H. A., 37 Mironov, Yuriy, 429 Mizuno, Takafumi, 499 Morita, Masaki, 509 Morozov, Evgeniy, 413
N Nenashev, Vadim, 231 Nicheporchuk, Valery, 167 Noennig, Jörg Rainer, 533 Norikumo, Shunei, 489
O Ohira, Yuki, 461 Ohishi, Mineaki, 267, 279 Ohnishi, Takaaki, 27 Ohya, Takao, 517 Okada, Isamu, 525 Okano, Nozomi, 461 Orlov, Yuriy, 403 Oyamaguchi, Natsumi, 525
P Pahikkala, Tapio, 15 Polikarpov, Alexander, 403
R Ryabkov, Ilya, 413
Author Index S Samet, Ahmed, 37 Sato-Ilic, Mika, 291, 303 Selivanov, Dmitriy, 403 Sergeev, Alexander, 223, 231 Sergeev, Mikhail, 223, 231 Serikov, Alexsey, 403 Shaderkin, Igor, 413 Shah, Chiranjay, 533 Shang, You, 91 Simonov, Konstantin, 155, 243 Subbotin, Vladislav, 329 Sukhikh, Gennadiy, 413 T Tajima, Hiroyuki, 525 Takahashi, Masao, 303 Tanaka-Yamawaki, Mieko, 479 Tarasov, Vadim, 413 Tkachenko, Valeriy, 429 Toko, Yukako, 291 Tyurina, Elena, 403 V Vannucci, Marco, 211 Viljanen, Markus, 15 Vostrikov, Anton, 223, 231 W Wakaki, Hirofumi, 267 Wi˛eckowski, Jakub, 341, 353, 365 Wierzbowska, Izabela, 3 Wiesenhütter, Sebastian, 533 Y Yamamura, Mariko, 279 Yamanaka, Masanori, 479 Yamatani, Katsu, 509 Yanagihara, Hirokazu, 267, 279 Yuriy, Andrey, 429 Z Zanella, Marina, 63 Zanni-Merk, Cecilia, 37 Zdunek, Rafał, 315 Zhao, Xiangfu, 63 Zilberman, Arkadi, 141 Zong, Siyu, 91 Zotin, Aleksandr, 155, 243 Zykov, Sergey, 379, 429, 437