502 93 24MB
English Pages 692 [647] Year 2021
Smart Innovation, Systems and Technologies 238
Ireneusz Czarnowski Robert J. Howlett Lakhmi C. Jain Editors
Intelligent Decision Technologies Proceedings of the 13th KES-IDT 2021 Conference
Smart Innovation, Systems and Technologies Volume 238
Series Editors Robert J. Howlett, Bournemouth University and KES International, Shoreham-by-sea, UK Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK
The Smart Innovation, Systems and Technologies book series encompasses the topics of knowledge, intelligence, innovation and sustainability. The aim of the series is to make available a platform for the publication of books on all aspects of single and multi-disciplinary research on these themes in order to make the latest results available in a readily-accessible form. Volumes on interdisciplinary research combining two or more of these areas is particularly sought. The series covers systems and paradigms that employ knowledge and intelligence in a broad sense. Its scope is systems having embedded knowledge and intelligence, which may be applied to the solution of world problems in industry, the environment and the community. It also focusses on the knowledge-transfer methodologies and innovation strategies employed to make this happen effectively. The combination of intelligent systems tools and a broad range of applications introduces a need for a synergy of disciplines from science, technology, business and the humanities. The series will include conference proceedings, edited collections, monographs, handbooks, reference books, and other relevant types of book in areas of science and technology where smart systems and technologies can offer innovative solutions. High quality content is an essential feature for all book proposals accepted for the series. It is expected that editors of all accepted volumes will ensure that contributions are subjected to an appropriate level of reviewing process and adhere to KES quality principles. Indexed by SCOPUS, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH, Japanese Science and Technology Agency (JST), SCImago, DBLP. All books published in the series are submitted for consideration in Web of Science.
More information about this series at http://www.springer.com/series/8767
Ireneusz Czarnowski · Robert J. Howlett · Lakhmi C. Jain Editors
Intelligent Decision Technologies Proceedings of the 13th KES-IDT 2021 Conference
Editors Ireneusz Czarnowski Gdynia Maritime University Gdynia, Poland Lakhmi C. Jain KES International Shoreham-by-Sea, UK
Robert J. Howlett Bournemouth University Poole, UK KES International Shoreham-by-Sea, UK
ISSN 2190-3018 ISSN 2190-3026 (electronic) Smart Innovation, Systems and Technologies ISBN 978-981-16-2764-4 ISBN 978-981-16-2765-1 (eBook) https://doi.org/10.1007/978-981-16-2765-1 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore
KES-IDT 2021 Conference Organization
Honorary Chairs Lakhmi C. Jain, KES International, UK Gloria Wren-Phillips, Loyola University, USA
General Chair Ireneusz Czarnowski, Gdynia Maritime University, Poland
Executive Chair Robert J. Howlett, KES International and Bournemouth University, UK
Program Chair Jose L. Salmeron, University Pablo de Olavide, Seville, Spain Antonio J. Tallón-Ballesteros, University of Seville, Spain
Publicity Chair Izabela Wierzbowska, Gdynia Maritime University, Poland Alfonso Mateos Caballero, Universidad Politécnica de Madrid, Spain
v
vi
KES-IDT 2021 Conference Organization
Special Sessions Multi-Criteria Decision Analysis—Theory and Their Applications Wojciech Sałabun, West Pomeranian University of Technology in Szczecin, Poland
Advances in Intelligent Data Processing and its Applications Margarita Favorskaya, Reshetnev Siberian State University of Science and Technology, Russian Federation Lakhmi C. Jain, University of Technology Sydney, Australia Mikhail Sergeev, Saint Petersburg State University of Aerospace Instrumentation, Russian Federation
High-Dimensional Data Analysis, Knowledge Processing and Applications Mika Sato-Ilic, University of Tsukuba, Japan Tamara Shikhnabieva, Russian Technological University, Russian Federation Lakhmi Jain, University of Technology Sydney, Australia
Decision Making Theory for Economics Takao Ohya, Kokushikan University, Japan Takafumi Mizuno, Meijo University, Japan
Innovative Technologies and Applications in Computer Intelligence Takumi Ichimura, Prefectural University of Hiroshima, Japan Keiichi Tamura, Hiroshima City University, Japan Kamada Shin, Prefectural University of Hiroshima, Japan
KES-IDT 2021 Conference Organization
vii
Intelligent Diagnosis and Monitoring of Systems: Methods, Tools, and Applications Gianfranco Lamperti, University of Brescia, Italy Marina Zanella, University of Brescia, Italy
Knowledge Engineering in Large-Scale Systems Sergey V. Zykov, National Research University, Russian Federation
Spatial Data Analysis and Sparse Estimation Mariko Yamamura, Department of Statistics, Radiation Effects Research Foundation, Japan
International Program Committee and Reviewers Jair M. Abe, Paulista University, Sao Paulo, Brazil Miltos Alamaniotis, University of Texas at San Antonio, USA Dmitry Alexandrov, HSE University, Russian Federation Mohamed Arezki Mellal, M’Hamed Bougara University, Algeria Piotr Artiemjew, University of Warmia and Mazury, Poland Dariusz Barbucha, Gdynia Maritime University, Poland Alina Barbulescu, Transilvania University of Brasov, Romania Andreas Behrend, Technical University of Cologne, Germany Monica Bianchini, University of Siena, Italy Francesco Bianconi, Università degli Studi di Perugia, Italy Janos Botzheim, Budapest University of Technology and Economics, Hungary Adriana Burlea-Schiopoiu, University of Craiova, Romania Vladimir Buryachenko, Siberian State University of Science and Technology, Russian Federation Frantisek Capkovic, Slovak Academy of Sciences, Slovak Republic Giovanna Castellano, University of Bari Aldo Moro, Italy Gloria Cerasela Crisan, Vasile Alecsandri University of Bacau, Romania Shyi-Ming Chen, National Taiwan University of Science and Technology, Taiwan Shing Chiang Tan, Multimedia University, Malaysia Jerry Chun-Wei Lin, Western Norway University of Applied Sciences, Norway Marco Cococcioni, University of Pisa, Italy
viii
KES-IDT 2021 Conference Organization
Angela Consoli, Defence Science and Technology Group, Australia Paolo Crippa, Universita Politecnica delle Marche, Italy Gloria Crisan, Vasile Alecsandri University of Bacau, Romania Ireneusz Czarnowski, Gdynia Maritime University, Poland Dinu Dragan, University of Novi Sad, Serbia Margarita Favorskaya, Reshetnev Siberian State University of Science and Technology, Russian Federation Wojciech Froelich, University of Silesia, Poland Claudia Frydman, Aix-Marseille University, France Keisuke Fukui, Hiroshima University, Japan Marcos G. Quiles, UNIFESP, Brazil Mario G. C. A. Cimino, University of Pisa, Italy Mauro Gaggero, National Research Council of Italy, Italy Christos Grecos, Consultant, Ireland Foteini Grivokostopoulou, University of Patras, Greece Aleksandra Gruca, Silesian University of Technology, Poland Akira Hara, Hiroshima University, Japan Ralf-Christian Harting, Aalen University, Germany Ioannis Hatzilygeroudis, University of Patras, Greece Bogdan Hoanca, University of Alaska Anchorage, USA Dawn Holmes, University of California, USA Katsuhiro Honda, Osaka Prefecture University, Japan Tzung-Pei Hong, National University of Kaohsiung, Taiwan Daocheng Hong, East China Normal University, China Takumi Ichimura, Prefectural University of Hiroshima, Japan Anca Ignat, Alexandru Ioan Cuza University, Romania Mirjana Ivanovic, University of Novi Sad, Serbia Yuji Iwahori, Chubu University, Japan Joanna Jedrzejowicz, University of Gdansk, Poland Piotr Jedrzejowicz, Gdynia Maritime University, Poland Dragan Jevtic, University of Zagreb, Croatia Prof Björn Johansson, Linköping University, Sweden Shin Kamada, Prefectural University of Hiroshima, Japan Nikos Karacapilidis, University of Patras, Greece Pawel Kasprowski, Silesian University of Technology, Poland Shuichi Kawano, The University of Electro-Communications, Japan Kavyaganga Kilingaru, University of South Australia, Australia Bartłomiej Kizielewicz, West Pomeranian University of Technology in Szczecin, Poland Frank Klawonn, Ostfalia University, Germany Aleksandar Kovaˇcevi´c, University of Novi Sad, Serbia Boris Kovalerchuk, Central Washington University, USA Marek Kretowski, Bialystok University of Technology, Poland Dilip Kumar Pratihar, Indian Institute of Technology Kharagpur, India Vladimir Kurbalija, University of Novi Sad, Serbia
KES-IDT 2021 Conference Organization
ix
Kazuhiro Kuwabara, Ritsumeikan University, Japan Gianfranco Lamperti, University of Brescia, Italy Georgy Lebedev, Sechenov University, Russia Giorgio Leonardi, Università del Piemonte Orientale, Italy Jerry Lin, Western Norway University of Applied Sciences, Norway Pei-Chun Lin, Feng Chia University, Taiwan Ivan Lukovic, University of Novi Sad, Serbia Alina Maria Cristea, University of Bucharest, Romania Christophe Marsala, Sorbonne Université, France Alfonso Mateos Caballero, Universidad Politécnica de Madrid, Spain Kazuya Mera, Hiroshima City University, Japan Lyudmila Mihaylova, University of Sheffield, UK Takafumi Mizuno, Meijo University, Japan Yasser Mohammad, Assiut University, Egypt Suneeta Mohanty, School of Computer Engineering, KIIT University, India Nikita Morgun, MEPhI National Nuclear Research University, Italy Mikhail Moshkov, King Abdullah University of Science and Technology, Saudi Arabia Sameera Mubarak, University of South Australia, Australia Maxim Muratov, Moscow Institute of Physics and Technology, Russian Federation Yoshiyuki Ninomiya, The Institute of Statistical Mathematics, Japan Shunei Norikumo, Osaka University of Commerce, Japan Marek Ogiela, AGH University of Science and Technology, Poland Mineaki Ohishi, Hiroshima University, Japan Takao Ohya, Kokushikan University, Japan Shih Yin Ooi, Multimedia University, Malaysia Jeng-Shyang Pan, Shandong University of Science and Technology, China Mrutyunjaya Panda, Utkal University, India Mario Pavone, University of Catania, Italy Isidoros Perikos, University of Patras, Greece Petra Perner, FutureLab Artificial Intelligence_IBaI_II, Germany Anitha Pillai, Hindustan Institute of Technology and Science, India Camelia Pintea, Technical University Cluj-Napoca, Romania Bhanu Prasad, Florida A&M University, USA Radu-Emil Precup, Politehnica University of Timisoara, Romania Jim Prentzas, Democritus University of Thrace, Greece Malgorzata Przybyla-Kasperek, University of Silesia in Katowice, Poland Marcos Quiles, Federal University of São Paulo—UNIFESP, Brazil Milos Radovanovic, University of Novi Sad, Serbia Sheela Ramanna, University of Winnipeg, Canada Ewa Ratajczak-Ropel, Gdynia Maritime University, Poland Ana Respício, University of Lisbon, Portugal Gerasimos Rigatos, Industrial Systems Institute, Greece
x
KES-IDT 2021 Conference Organization
Alvaro Rocha, University of Lisbon, Portugal Anatoliy Sachenko, Ternopil National Economic University, Ukraine Tatsuhiro Sakai, Shimane University, Japan Wojciech Salabun, West Pomeranian University of Technology in Szczecin, Poland Hadi Saleh, HSE University, Russian Federation Mika Sato-Ilic, University of Tsukuba, Japan Milos Savic, University of Novi Sad, Serbia Md Shohel Sayeed, Multimedia University, Malaysia Rafal Scherer, Czestochowa University of Technology, Poland Hirosato Seki, Osaka University, Japan Andrii Shekhovtsov, West Pomeranian University of Technology in Szczecin, Poland Tamara Shikhnabieva, Plekhanov Russian University of Economics, Russian Federation Marek Sikora, Silesian University of Technology, Poland Milan Simic, Royal Melbourne Institute of Technology, Australia Aleksander Skakovski, Gdynia Maritime University, Poland Alexey Smagin, Ulyanovsk State University, Russian Federation Urszula Stanczyk, Silesian University of Technology, Poland Margarita Stankova, New Bulgarian University, Bulgaria Catalin Stoean, University of Craiova, Romania Ruxandra Stoean, University of Craiova, Romania Ahmad Taher Azar, Prince Sultan University, Kingdom of Saudi Arabia Keiichi Tamura, Hiroshima City University, Japan Dilhan Thilakarathne, VU University Amsterdam/ING Bank, The Netherlands Emer. Toshiro Minami, Kyushu Institute of Information Sciences, Japan Edmondo Trentin, University of Siena, Italy Eiji Uchino, Yamaguchi University, Japan Carl Vogel, Trinity College Dublin, Ireland Zeev Volkovich, ORT Braude College, Israel Mila Dimitrova Vulchanova, Norwegian University of Science and Technology, Norway Jerzy W. Grzymala-Busse, University of Kansas, USA Fen Wang, Central Washington University, USA Junzo Watada, Waseda University, Japan Jaroslaw Watrobski, University of Szczecin, Poland Yoshiyuki Yabuuchi, Shimonoseki City University, Japan Mariko Yamamura, Radiation Effects Research Foundation, Japan Hirokazu Yanagihara, Hiroshima University, Japan Kazuyoshi Yata, University of Tsukuba, Japan Hiroyuki Yoshida, Harvard Medical School, USA Dmitry Zaitsev, Odessa State Environmental University, Ukraine Marina Zanella, University of Brescia, Italy Beata Zielosko, University of Silesia, Katowice, Poland
KES-IDT 2021 Conference Organization
xi
Alfred Zimmermann, Reutlingen University, Germany Alexandr Zotin, Reshetnev Siberian State University of Science and Technology, Russian Federation Sergey Zykov, National Research University and MEPhI National Nuclear Research University, Russia
Preface
This volume contains the proceedings of the 13th International KES Conference on Intelligent Decision Technologies (KES-IDT 2021). The conference was held as a Virtual Conference, on June 14–16, 2021. The 13th edition of KES-IDT was a second conference held under restrictions in response to the COVID-19 pandemic. KES-IDT is an international annual conference organized by KES International being a sub-series of the KES Conference series. KES-IDT is an interdisciplinary conference with opportunities for the presentation of new research results and discussion about them under the common title “Intelligent Decision Technologies.” The conference has been creating for years a platform for knowledge transfer and the generation of new ideas. This edition, KES-IDT 2021, attracted a number of researchers and practitioners from all over the world. The papers have been allocated to the main track and eight special sessions. All received papers have been reviewed by 2–3 members of the International Program Committee and International Reviewer Board, and only the selected number of them has been presented during the conference and included in the KES-IDT 2021 proceedings. We are very satisfied with the quality of the program and would like to thank the authors for choosing KES-IDT as the forum for the presentation of their work. Also, we gratefully acknowledge the hard work of the KES-IDT international program committee members and of the additional reviewers for taking the time to review the submitted papers and selecting the best among them for presentation at the conference and inclusion in its proceedings. Despite the difficult time of the pandemic, we hope that KES-IDT 2021 significantly contributes to the fulfillment of academic excellence and leads to even greater successes of KES-IDT events in the future. Gdynia, Poland Poole/Shoreham-by-Sea, UK Shoreham-by-Sea, UK June 2021
Ireneusz Czarnowski Robert J. Howlett Lakhmi C. Jain
xiii
Contents
Main Track ArgVote: Which Party Argues Like Me? Exploring an Argument-Based Voting Advice Application . . . . . . . . . . . . . . . . . . . . . . . Markus Brenneis and Martin Mauve
3
Detecting Communities in Organizational Social Network Based on E-mail Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dariusz Barbucha and Paweł Szyman
15
Impact of the Time Window Length on the Ship Trajectory Reconstruction Based on AIS Data Clustering . . . . . . . . . . . . . . . . . . . . . . . . Marta Mieczy´nska and Ireneusz Czarnowski
25
Improved Genetic Algorithm for Electric Vehicle Charging Station Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohamed Wajdi Ouertani, Ghaith Manita, and Ouajdi Korbaa
37
Solving a Many-Objective Crop Rotation Problem with Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christian von Lücken, Angel Acosta, and Norma Rojas
59
The Utility of Neural Model in Predicting Tax Avoidance Behavior . . . . . Coita Ioana-Florina and Codru¸ta Mare
71
Triple-Station System of Detecting Small Airborne Objects in Dense Urban Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mikhail Sergeev, Anton Sentsov, Vadim Nenashev, and Evgeniy Grigoriev
83
Using Families of Extremal Quasi-Orthogonal Matrices in Communication Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anton Vostrikov, Alexander Sergeev, and Yury Balonin
95
xv
xvi
Contents
Variable Selection for Correlated High-Dimensional Data with Infrequent Categorical Variables: Based on Sparse Sample Regression and Anomaly Detection Technology . . . . . . . . . . . . . . . . . . . . . . . 109 Yuhei Kotsuka and Sumika Arima Verification of the Compromise Effect’s Suitability Based on Product Features of Automobiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Takumi Kato Advances in Intelligent Data Processing and Its Applications Application of Granular Computing-Based Pre-processing in the Labelling of Phonemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Negin Ashrafi and Sheela Ramanna Application of Implicit Grid-Characteristic Methods for Modeling Wave Processes in Linear Elastic Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Evgeniy Pesnya, Anton A. Kozhemyachenko, and Alena V. Favorskaya Combined Approach to Modeling of Acceleration Response Spectra in Areas of Active Shallow Seismicity . . . . . . . . . . . . . . . . . . . . . . . . 161 Vasiliy Mironov, Konstantin Simonov, Aleksandr Zotin, and Mikhail Kurako Methods of Interpretation of CT Images with COVID-19 for the Formation of Feature Atlas and Assessment of Pathological Changes in the Lungs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Aleksandr Zotin, Anzhelika Kents, Konstantin Simonov, and Yousif Hamad Multilevel Watermarking Scheme Based on Pseudo-barcoding for Handheld Mobile Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Margarita N. Favorskaya and Alexandr V. Proskurin Multi-quasi-periodic Cylindrical and Circular Images . . . . . . . . . . . . . . . . 197 Victor R. Krasheninnikov, Yuliya E. Kuvayskova, and Alexey U. Subbotin On the Importance of Capturing a Sufficient Diversity of Perspective for the Classification of Micro-PCBs . . . . . . . . . . . . . . . . . . . 209 Adam Byerly, Tatiana Kalganova, and Anthony J. Grichnik Robust Visual Vocabulary Based On Grid Clustering . . . . . . . . . . . . . . . . . 221 Achref Ouni, Eric Royer, Marc Chevaldonné, and Michel Dhome Study of Anisotropy of Seismic Response from Fractured Media . . . . . . . 231 Alena Favorskaya and Vasily Golubev Synchronization Correction Enforced by JPEG Compression in Image Watermarking Scheme for Handheld Mobile Devices . . . . . . . . . 241 Margarita N. Favorskaya and Vladimir V. Buryachenko Tracking of Objects in Video Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Nikita Andriyanov, Vitaly Dementiev, and Dmitry Kondratiev
Contents
xvii
Multi-Criteria Decision-Analysis Methods—Theory and Their Applications A New Approach to Identifying of the Optimal Preference Values in the MCDA Model: Cat Swarm Optimization Study Case . . . . . . . . . . . . 265 Jakub Wi˛eckowski, Andrii Shekhovtsov, and Jarosław W˛atróbski A Study of Different Distance Metrics in the TOPSIS Method . . . . . . . . . 275 Bartłomiej Kizielewicz, Jakub Wi˛eckowski, and Jarosław W˛atrobski Assessment and Improvement of Intelligent Technology in Architectural Design Satisfactory Development Advantages Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Vivien Yi-Chun Chen, Jerry Chao-Lee Lin, Zheng Wu, Hui-Pain Lien, Pei-Feng Yang, and Gwo-Hshiung Tzeng IT Support for the Optimization of the Epoxidation of Unsaturated Compounds on the Example of the TOPSIS Method . . . . . . . . . . . . . . . . . . 297 Aleksandra Radomska-Zalas and Anna Fajdek-Bieda Land Suitability Evaluation by Integrating Multi-criteria Decision-Making (MCDM), Geographic Information System (GIS) Method, and Augmented Reality-GIS . . . . . . . . . . . . . . . . . . . . . . . . . . 309 Hanhan Maulana and Hideaki Kanai Toward Reliability in the MCDA Rankings: Comparison of Distance-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 Andrii Shekhovtsov, Jakub Wi˛eckowski, and Jarosław W˛atróbski Knowledge Engineering in Large-Scale Systems Affection of Java Design Patterns to Cohesion Metrics . . . . . . . . . . . . . . . . 333 Sergey Zykov, Dmitry Alexandrov, Maqsudjon Ismoilov, Anton Savachenko, and Artem Kozlov Applicative-Frame Model of Medical Knowledge Representation . . . . . . 343 Georgy S. Lebedev, Alexey Losev, Eduard Fartushniy, Sergey Zykov, Irina Fomina, and Herman Klimenko Eolang: Toward a New Java-Based Object-Oriented Programming Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 Hadi Saleh, Sergey Zykov, and Alexander Legalov Mission-Critical Goals Impact onto Process Efficiency: Case of Aeroflot Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 Alexander Gromoff, Sergey Zykov, and Yaroslav Gorchakov
xviii
Contents
High-Dimensional Data Analysis, Knowledge Processing and Applications A Classification Method Based on Ensemble Learning of Deep Learning and Multidimensional Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Kazuya Miyazawa and Mika Sato-Ilic A Consistent Likelihood-Based Variable Selection Method in Normal Multivariate Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 Ryoya Oda and Hirokazu Yanagihara A Hybrid Method of Multi-class SVM and Classification Method Based on Reliability Score for Autocoding of the Family Income and Expenditure Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 Yukako Toko and Mika Sato-Ilic Individual Difference Assessment Method Based on Cluster Scale Using a Data Reduction Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Kazuki Nitta and Mika Sato-Ilic Spatial Data Analysis and Sparse Estimation Coordinate Descent Algorithm for Normal-Likelihood-Based Group Lasso in Multivariate Linear Regression . . . . . . . . . . . . . . . . . . . . . . 429 Hirokazu Yanagihara and Ryoya Oda Discriminant Analysis via Smoothly Varying Regularization . . . . . . . . . . . 441 Hisao Yoshida, Shuichi Kawano, and Yoshiyuki Ninomiya Optimizations for Categorizations of Explanatory Variables in Linear Regression via Generalized Fused Lasso . . . . . . . . . . . . . . . . . . . . 457 Mineaki Ohishi, Kensuke Okamura, Yoshimichi Itoh, and Hirokazu Yanagihara Robust Bayesian Changepoint Analysis in the Presence of Outliers . . . . . 469 Shonosuke Sugasawa and Shintaro Hashimoto Spatio-Temporal Adaptive Fused Lasso for Proportion Data . . . . . . . . . . . 479 Mariko Yamamura, Mineaki Ohishi, and Hirokazu Yanagihara Variable Fusion for Bayesian Linear Regression via Spike-and-slab Priors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 Shengyi Wu, Kaito Shimamura, Kohei Yoshikawa, Kazuaki Murayama, and Shuichi Kawano Intelligent Diagnosis and Monitoring of Systems: Methods, Tools, and Applications Diagnosis of Active Systems with Abstract Observability . . . . . . . . . . . . . . 505 Gianfranco Lamperti, Marina Zanella, and Xiangfu Zhao
Contents
xix
Java2CSP—A Model-Based Diagnosis Tool Not Only for Software Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 Franz Wotawa and Vlad Andrei Dumitru Meta-diagnosis via Preference Relaxation for State Trackability . . . . . . . 531 Xavier Pucel, Stéphanie Roussel, Louise Travé-Massuyès, and Valentin Bouziat Model-Based Diagnosis of Time Shift Failures in Discrete Event Systems: A (Max,+) Observer-Based Approach . . . . . . . . . . . . . . . . . . . . . . . 545 Claire Paya, Euriell Le Corronc, Yannick Pencolé, and Philippe Vialletelle Innovative Technologies and Applications in Computer Intelligence Adaptive Structural Deep Learning to Recognize Kinship Using Families in Wild Multimedia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559 Takumi Ichimura and Shin Kamada Detecting Adversarial Examples for Time Series Classification and Its Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569 Jun Teraoka and Keiichi Tamura Efficient Data Presentation Method for Building User Preference Model Using Interactive Evolutionary Computation . . . . . . . . . . . . . . . . . . 583 Akira Hara, Jun-ichi Kushida, Ryohei Yasuda, and Tetsuyuki Takahama Image-Based Early Detection of Alzheimer’s Disease by Using Adaptive Structural Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595 Shin Kamada, Takumi Ichimura, and Toshihide Harada Decision Making Theory for Economics Calculations of SPCM by Several Methods for MDAHP Including Hierarchical Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609 Takao Ohya Equilibria Between Two Sets of Pairwise Comparisons as Solutions of Decision-Making with Orthogonal Criteria . . . . . . . . . . . . . . . . . . . . . . . . 617 Takafumi Mizuno Fluctuations in Evaluations with Multi-branch Tree Method for Efficient Resource Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627 Natsumi Oyamaguchi Foundations for Model Building of Intelligent Pricing Methodology . . . . 639 Marina Kholod, Yury Lyandau, Valery Maslennikov, Irina Kalinina, and Ekaterina Borovik
xx
Contents
Informatization of Life Insurance Companies and Organizational Decision Making (Case of Nippon Life Insurance Company) . . . . . . . . . . . 649 Shunei Norikumo Value Measurement and Taxation Metrics in the Model-Building Foundations for Intelligent Pricing Methodology . . . . . . . . . . . . . . . . . . . . . 659 Marina Kholod, Yury Lyandau, Elena Popova, Aleksei Semenov, and Ksenia Sadykova Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669
About the Editors
Dr. Ireneusz Czarnowski is Professor at the Gdynia Maritime University. He holds B.Sc. and M.Sc. degrees in Electronics and Communication Systems from the same University. He gained the doctoral degree in the field of computer science in 2004 at Faculty of Computer Science and Management of Poznan University of Technology. In 2012, he earned a postdoctoral degree in the field of computer science in technical sciences at Wroclaw University of Science and Technology. His research interests include artificial intelligence, machine learning, evolutionary computations, multiagent systems, data mining and data science. He is Associate Editor of the Journal of Knowledge-Based and Intelligent Engineering Systems, published by the IOS Press, and a reviewer for several scientific journals. Dr. Robert Howlett is Executive Chair of KES International, a non-profit organization that facilitates knowledge transfer and the dissemination of research results in areas including intelligent systems, sustainability and knowledge transfer. He is Visiting Professor at Bournemouth University in the UK. His technical expertise is in the use of intelligent systems to solve industrial problems. He has been successful in applying artificial intelligence, machine learning and related technologies to sustainability and renewable energy systems; condition monitoring, diagnostic tools and systems; and automotive electronics and engine management systems. His current research work is focussed on the use of smart microgrids to achieve reduced energy costs and lower carbon emissions in areas such as housing and protected horticulture. Dr. Lakhmi C. Jain received his Ph.D., M.E., B.E. (Hons) from the University of Technology Sydney, Australia, and Liverpool Hope University, UK and is Fellow of Engineers Australia. Professor Jain serves the KES International for providing a professional community the opportunities for publications, knowledge exchange, cooperation and teaming. Involving around 5000 researchers drawn from universities and companies worldwide, KES facilitates international cooperation and generates synergy in teaching and research. KES regularly provides networking opportunities for professional community through one of the largest conferences of its kind in the area of KES. xxi
Main Track
ArgVote: Which Party Argues Like Me? Exploring an Argument-Based Voting Advice Application Markus Brenneis and Martin Mauve
Abstract A lof of people use voting advice applications (VAAs) as a decisionmaking tool to assist them in deciding which political party to vote for in an election. We think that arguments for/against political positions also play an important role in this decision process, but they are not considered in classical VAAs. Therefore, we introduce a new kind of VAA, ArgVote, which considers opinions on arguments when calculating voter–party similarity. We present the results of an empirical study comprising two groups who used ArgVote with and without arguments. Our results indicate that arguments improve the understanding of political issues and different opinions, and that people enjoy the interaction with arguments. On the other hand, the matching algorithm which considers arguments was not better, and user interface improvements are needed. The user profiles we collected are provided to assist further research. Keywords Argumentation · Data set · Voting advice applications
1 Introduction Many people [1, 2] around the world use voting advice applications (VAA) like Vote Compass or the German Wahl-O-Mat. They inform themselves about positions of different parties concerning current political issues before general elections to receive help in deciding for whom to vote. In many applications, the similarities between voters and parties are calculated with a high-dimensional proximity model [3], based on proximity voting logic [4], where parties are matched with voters based on their opinions concerning a number of political positions. Classical VAAs, however, do not consider why parties and voters maintain certain views. Consider, for instance, Party A being against nuclear power because it thinks M. Brenneis (B) · M. Mauve Heinrich-Heine-Universität, Universitätsstraße 1, 40225 Düsseldorf, Germany e-mail: [email protected] URL: https://cs.hhu.de/ © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_1
3
4
M. Brenneis and M. Mauve
nuclear power plants are dangerous, and Party B is against nuclear power because nuclear waste cannot be stored safely. If a voter thinks that nuclear power plants are safe, they are certainly closer to Party B than to Party A. But a classical VAA, which only asks whether the voter is for or against nuclear power, would not capture this information. Therefore, we assume that not only the opinions concerning political positions but also the arguments used to sustain these positions are relevant for the personal party preference. Hence, we have developed ArgVote, a new kind of VAA, which does not only consider the political positions, but also the arguments used to arrive at the given position. In an online survey comprising two groups, we tested the acceptance of our new application and whether its new matching algorithm performs better than that of a classical VAA. We also questioned whether people are more informed when arguments are presented, and if they can indicate their own political opinion more easily. In the next section, we explain why and how we developed an argument-based VAA. Then we present our methods and hypotheses. In Sect. 4, we show our results and subsequently discuss their consequences. Finally, we have a look at related work and summarize our findings.
2 Designing an Argument-Based VAA We now sum up the key motivations for developing an argument-based VAA and then present how our new application ArgVote looks like.
2.1 Limitations of Classical VAAs As described in the introduction, we think that the reasons why a party has certain attitudes are also important for providing sensible support for a voting decision. If, in our example, the problem with nuclear waste was solved, then Party B would be likely to change its attitude towards nuclear power, as would a voter who was against nuclear power for the same reason. This reinforces our stance that arguments are relevant. What is more, voters might not be familiar with an issues raised within a VAA, and they tend not to “look up additional information on the web and oftentimes ‘just’ provide a neutral no opinion answer” [5]. We conjecture that providing arguments for and against a position right within the VAA increases the informedness of voters, who can then better express their opinion and get more meaningful results, i.e., a more suitable voting advice.
ArgVote: Which Party Argues Like Me? . . .
5
Another advantage of arguments is making it harder for parties to “cheat” when the parties provide the answers to the questions in the VAA themselves. Sometimes, parties indicate to be neutral instead of taking an unpopular position to improve their results [2], which leads to inconsistencies between the official stance of a party and its reasons.
2.2 How ArgVote Works Based on the design of the German VAA Wahl-O-Mat, we have developed ArgVote which additionally displays arguments for and against agreeing with a position (see Fig. 1). The arguments can be displayed before the voter indicates their opinion, but ArgVote also explicitly asks the voter to (optionally) choose their arguments after opinion input. If available, (counter)arguments for/again the arguments displayed can be navigated through. The arguments presented in ArgVote were provided by the parties beforehand. As in the Wahl-O-Mat, political issues, but also arguments, can be marked as important, giving them a higher weight in the matching algorithm. After the last question, a user can compare their arguments with the parties’ arguments and sees a bar chart indicating how much they agree with the attitudes of the individual parties.
Fig. 1 Main user interface of ArgVote: The user is asked for their arguments after indicating their opinion on an issue, but they can also display the arguments beforehand. “More …” can be used to see (counter)arguments to arguments
6
M. Brenneis and M. Mauve
In the classical matching algorithm used by the Wahl-O-Mat [6], party and voter have a distance 0 for an issue if they have the same opinion, 0.5 if they are different and one is neutral, and 1 otherwise; the value is doubled for issues marked as important. ArgVote’s matching algorithm is based on our pseudometric for weighted argumentation graphs [7], which also considers the opinions on arguments for/against the positions. agree, neutral and disagree are translated to opinion values 0.5, 0, or −0.5, respectively, in this model. When a position or argument is marked as important, the corresponding edge in the argumentation tree gets a doubled weight. The relative importance of opinions on arguments and positions can be balanced with a parameter α of the used pseudometric (similar to PageRank’s [8] damping factor). ArgVote uses α = 0.3, giving the opinions on positions a slighter higher influence than the arguments used. This choice is motivated by the results of an earlier empirical study of ours [9], which indicated that opinions on positions are considered more important by most people than opinions on arguments. From the same study, we also learned that the results of the chosen pseudometric matches human intuition well, and thus, are understandable, in many argumentative contexts.
3 Hypotheses and Methods With ArgVote, we want to identify differences after using an argument-based VAA and a classical VAA. For our experiment, we recruited German participants from within our personal contacts1 and let them use ArgVote in two different modes: Group 1 used ArgVote as described above, Group 2 (control group) used ArgVote without arguments displayed under the theses, i.e., it basically behaved like the WahlO-Mat. Before the participants used ArgVote, we asked them for their sympathy with the biggest German parties (in alphabetical order: AfD, CDU, Die Linke, FDP, SPD, and Grüne), which were included in ArgVote. The content for ArgVote was copied from the Wahl-O-Mat of the European Parliament Election 2019, which had been the last election where all Germans were allowed to vote and comprised 38 positions. We only used the first 15 positions in both groups to reduce the time needed for participation. The complete argumentation corpus contains 294 arguments for all political theses and 147 arguments for the first 15 issues. It was created by three annotators based on the justification statements the parties provided in the Wahl-O-Mat. All annotators independently annotated for each argument whether it is used by a party. The annotator agreement in terms of Krippendorff’s alpha [10, p. 211 ff.] is 78%. In our experiment, we want to research the differences between both groups regarding subjective informedness, ease of indicating an opinion on a thesis, better matching results compared to own party preferences, and usability assessment of ArgVote. After using ArgVote, we asked participants what features of ArgVote they 1 We
first planned to do on-campus recruiting of participants, but this was not possible due to the lockdown at that time.
ArgVote: Which Party Argues Like Me? . . .
7
used, how hard they were to use, and how well-informed they feel about policies. Moreover, we count how often user indicate no opinion. ArgVote also asks different questions about how much participants like their matching results (in overall and concerning the top position) to get a subjective rating of how good the result is. We also checked how close the calculated matching matches a participant’s party sympathy rating using the rank-biased overlap (RBO) [11]; RBO compares two sorted lists, where difference in the top-positions are punished more than differences in bottom positions. We also compare the average rank of a user’s party, as also done before for other VAAs by [3]. To wrap up, we have the following hypotheses: 1. Group 1 feels more informed after using ArgVote with arguments than Group 2 (control group without arguments). 2. It is easier for Group 1 to indicate an opinion for a political thesis. 3. Group 1 does not consider ArgVote harder to use. 4. Matching results of Group 1 better match participants’ party preferences. We want to clarify that we mainly focus on checking whether our general idea works well. If it works well, a bigger study can be considered, where improvements on the user interface, the selection and formulation of the arguments, and a more representative sample can be considered.
4 Results We now present our key findings, starting with the comparison of the experimental groups, and then checking our hypotheses presented in the previous section. The dataset containing the VAA questions and argumentation corpus, as well as the collected user profiles are provided online.2
4.1 General Information on Participants and Groups 60 participants successfully completed our survey (including two attention check questions). 30 were in Group 1 (with arguments), 30 in Group 2 (control group without arguments). 63% of the participants were male (German population: 49% [12]), the average age was 27 (German population average: 45 [12]), and more than 96% had at least a higher education entrance qualification (Hochschulreife; German average: 34% [12]).
2 https://github.com/hhucn/argvote-dataset.
8
M. Brenneis and M. Mauve
4.2 Hypothesis 1: Informedness Looking at the subjective answers about informdness, which had been asked after using ArgVote and are presented in Table 1, we could not deduce that Group 1 got a higher awareness of political topics, nor the differences of parties became clearer. But we saw that Group 1 got a clearer picture why there were different opinions, and they understood political issues significantly better.
4.3 Hypothesis 2: Ease of Indicating Opinion There was no big difference between both groups regarding the number of neutral answers and skipped questions. On average, 28% of participants in Group 1 chose a neutral answer, whereas 30% of participants in the control group did so (no significant difference, p = 1 with a χ 2 test). The average skip rate (i.e., providing no opinion on an issue) was 1.1% in Group 1, and 1.3% in Group 2 ( p = 0.58). On the other hand, the subjects in Group 1 considered indicating an opinion on a thesis slightly easier than those in Group 2 (cf. Table 2). We can also see that (dis)agreeing with arguments was not considered much more difficult than giving an opinion on a thesis, which means that this additional task was not too hard for VAA users. Group 1 also strongly agreed that seeing arguments next to the theses is useful (4.40 on a Likert scale from 1 to 5).
Table 1 Subjective level of informdness on a Likert scale from do not agree at all (1) to fully agree (5), p-values according to a Mann–Whitney rank test (MW) [13] Question Group 1 Group 2 p (MW) By using ArgVote I became aware of political issues After using ArgVote, the difference between the parties is clearer to me Using ArgVote helped me understand some political issues better After using ArgVote, it is clearer to me why there are different opinions on certain theses
2.87 2.70
2.60 2.87
0.20 0.77
3.40
2.33
kiout . In a strong community, each node has more connections within the communitythan with the rest of the graph. The subgraph C is a community in a weak sense if i∈C kiin > i∈C kiout . In a weak community, the sum of all degrees within C is larger than the sum of all degrees toward the rest of the network. It is easy to see that a community in a strong sense is also a community in a weak sense, whereas the converse is not true. The process of discovering the cohesive groups or clusters in the network is known as community detection [2]. It aims at dividing the vertices of a network into some number k of groups, while maximizing the number of edges inside these groups and minimizing the number of edges established between vertices in different groups [7]. The problem of community detection has been studied by many authors who proposed various methods to solve it. Because of the fact that the problem of finding communities in a network is intended as a data clustering problem, natural way to solve it is to use traditional methods which divides graph or cluster the nodes into groups. They include hierarchical clustering [12], partitional clustering [14], or spectral clustering [9]. Another similar group of algorithms to solve the community detection problem forms divisive algorithms, where the most known and popular algorithm is that proposed by Girvan and Newman [11]. Two other important groups of approaches to community detection are based on optimization or on dynamics. The former group aims at finding the maximum (minimum) of a function indicating
18
D. Barbucha and P. Szyman
the quality of a clustering over the space of all possible clusterings. Quality functions can express the goodness of a whole partition or of single clusters. The most popular quality function is the modularity by Newman and Girvan [18]. The latter group of methods focuses on identifying communities by running dynamical processes on the network, like diffusion [20], spin dynamics [23], or synchronization [4]. Detailed investigation of the community detection problem and methods used to solve it, the reader can find in one of the comprehensive reviews of this problem, for example [10].
3 Social Networks Based on E-mail Communication Social network analysis (SNA) methodology offers an interesting approach to the analysis of organizational structures, where people or units of the organization are represented in the form of nodes in the network, and relationships or information flow between these people or organizations are represented in the form of edges between the nodes. Moreover, SNA provides a number of measures which allow one to analyze different properties of such a network. Thanks to the SNA, it is possible for example to determine the positions and the roles of selected persons in a given organization. It may also help to describe and visualize structural bottlenecks, critical connections, irregular communication patterns, and isolated actors. SNA is often seen as a potential tool for detection of emerging communities, by detecting distinct groups or subgroups inside hierarchies in the organization [5]. Last years one can observe a growing interest of researchers in social networks based on digital forms of communication (like e-mails) within the organization. Because of the fact that e-mails represent a major source of electronic communication in the organization, it is expected that e-mail-based network analysis will also bring a lot of interesting observations related to selected aspects of organization and its activity. Analysis of social networks based on e-mail communication has a broad set of applications in analytics, ranging from organizational design to operational efficiency and change management [5]. For example, Creamer et al. [8] aimed at extracting social hierarchies from electronic communication data, Michalski et al. [16] focused on matching organizational structure and social network based on e-mail communication, Kolli and Narayanaswamy [13] analyzed e-mail communication network for crisis detection in a large organization, and Merten and Gloor [15] aimed at recognizing a possible sources of stress caused by e-mail.
Detecting Communities in Organizational Social Network . . .
19
4 Study on Organizational Social Network Based on E-mail Communication in Public Organization This paper focuses on organizational social network based on digital communication in one of the public organizations in Poland. The main goal of this study is to detect communities in this network based on the data extracted from organization’s e-mail server logs referring to digital communications between employees. The study has been divided into five steps: 1. Selecting periods of observation. 2. Getting the data referring to selected periods from organization’s e-mail server logs and their pre-processing. 3. Building the social network based on the obtained data. 4. Applying dedicated algorithms to detect the communities. 5. Analysis of the results and assessing the communities detected in the network in the context of the structure of the organization. It has been decided to build and study two social networks based on e-mail communication within the organization in two periods: April–June 2017 and April–June 2020. The first reason of selection of two periods, instead of a single one, was that the structure of the organization has been slightly changed between 2017 and 2020. The second reason of selection of the above two periods was the assumption that changes in social life and work due to the first stage of COVID-19 pandemic in 2020 (e.g., remote working) may change the patterns of communications between employees in comparison to 2017. It was expected that SNA tools will be able to recognize both changes basing on e-mail communication. The next step aimed at getting the data from organizational e-mail server logs. Due to the volume of the data, they have been restricted to those referring to a single selected unit of the organization. Table 1 presents main characteristics of the data: the number of all messages exchanged between employees within the whole organization, the number of messages exchanged between employees within the selected single organizational unit, and the number of employees working in this unit in both periods. Analyzing the data presented in Table 1, one can observe that the number of employees working in the unit decreased by 16 people (15.0%) when compare
Table 1 Main characteristics of the data referring to e-mail communication within the organization Category April–June 2017 April–June 2020 Number of messages sent and received within the whole 107,219 organization Number of messages sent and received within the 5949 selected organizational unit Number of employees working in the organizational unit 107
146,302 6692 91
20
D. Barbucha and P. Szyman
both periods. At the same time, the number of e-mail messages exchanged between employees significantly increased in 2020 when compare to 2017 (by 36.5% in the scale of the whole organization and by 12.5% when observed within a single unit). Assuming that both quite significant changes may influence the structure of the network and the results, it was decided to observe and analyze them in next stages. The next step focused on preparing and cleaning the data received from the server to obtain the needed information, interesting from the point of view of the study. Because of the fact that server logs including data referring to e-mail communication from April 2017 to June 2017 have been already examined in the previous work of one of the authors [24], this study focused only on a new set of the data (April 2020 to June 2020) and they have been got from the organization’s server logs. Next, the collected data have been organized as a set of records, where a sender and receivers have been observed for a single e-mail. Additionally, from the point of view of the goal of the experiment, it was also necessary to record the number of messages exchanged by each sender and receiver. An important aspect, due to the sensitivity of data such as e-mail addresses, was to ensure the security of the data. Therefore, the gained data were anonymized, which means that each e-mail address was assigned a dedicated number, and each pair was saved as a separate report. An important activity in this step was also to update the e-mail address database in order to capture the differences between 2017 and 2020. The completed set of data allowed us to build a social network based on email communication for further study. Each node of the network has referred to the sender/receiver and the edge between two nodes has been established when at least a single message has been exchanged between them during the observed period. Next step was to use dedicated tool to detect communities in both constructed social networks. It has been decided to use R software package with iGraph library [25]. Although R is a general purpose analytics tool used for statistical calculations and visualization of research results, it is supported by different packages, including iGraph dedicated to social network analysis. Four algorithms have been selected to analyze the network created in the previous step and to detect the potential communities. They included: • • • •
Edge Betweenes (GN)—cluster_edge_betweenness() [11], Fast Greedy (FG)—cluster_fast_greedy() [6], Walk Trap (WT)—cluster_walktrap() [20], Louvain (LV)—cluster_louvain() [3].
All algorithms use the modularity coefficient [18] as a quality of detected communities. Modularity is a measure dedicated to assess the structure of a network or a graph. Its value ranges from −1 to 1 and measures the density of connections within a community and compares it with the density of connections in another community. It can be assumed that for a real community consisting of strongly connected members this value is close to 1. The results of the experiment are presented in Table 2. First column of the table includes the name of the algorithm with reference to the paper with detailed description of it. For each algorithm, next columns include: the number of communities
Detecting Communities in Organizational Social Network . . .
21
Table 2 Results obtained by four algorithms for community detection in organizational social network Algorithm
Communities detected
Nodes in communities
Time (s)
2017
2020
2017
2020
2017
2020
2017
Modularity 2020
GN [11]
6
5
c1-15 c2-12 c3-13 c4-17 c5-25 c6-25
c1-15 c2-11 c3-13 c4-29 c5-23
0.04
0.04
0.6020
0.5961
FG [6]
4
5
c1-16 c2-26 c3-36 c4-29
c1-28 c2-23 c3-13 c4-12 c5-15
0.01
0.03
0.5411
0.4927
WT [20]
6
5
c1-18 c2-12 c3-12 c4-16 c5-25 c6-24
c1-23 c2-29 c3-15 c4-11 c5-13
0.01
0.03
0.5982
0.5461
LV [3]
6
5
c1-16 c2-12 c3-13 c4-16 c5-25 c6-25
c1-15 c2-11 c3-13 c4-29 c5-23
0.02
0.04
0.6017
0.5617
detected, the number of nodes forming each community, time (in seconds) needed by each algorithm to solve the problem, and the value of modularity coefficient for two analyzed periods. Analyzing the results presented in the Table 2 and focusing observation on the modularity coefficient, one can conclude that GN algorithm gives the best results for both analyzed periods (0.6020 in 2017 and 0.5961 in 2020) among all presented algorithms. The number of detected communities in 2017 and 2020 is equal to 5 and 6, respectively. The same number of communities has been detected by WT and LV algorithms, where modularity coefficient has been calculated at the same or slightly worse level than observed for GN algorithm. Definitely, the worse value of modularity coefficient has been calculated by FG algorithm. Moreover, only FG algorithm detected 4 and 5 communities in each period, respectively. Taking into account the execution time of each algorithm, the observed results are similar and they are not influenced by the chosen algorithm. An interesting observation which stems from the comparison of the number of communities detected by three out of four algorithms (GN, WT, LV) in both periods is that the number of communities detected has been decreased by 1 (from 6 to 5). It may suggest the changes in real structure of the organization. Indeed, two departments in the unit were merged into a single larger department in 2019. Taking into account that algorithms correctly detected changes in the organizations and real
22
D. Barbucha and P. Szyman
number of communities, one can conclude that SNA can be helpful in identifying changes in the structure of the organization. By looking at the column presenting the structure of detected communities, one can observe that algorithms GN, WT, and LV detected exactly the same communities in 2020. In case of the community detected by FG algorithm, the difference between it and communities detected by other algorithms is seen only for a single person. Please note, that labels of communities presented in Table 2 detected by different algorithms may differ, for example, community c1 detected by GN algorithm is the same as community c3 detected by WT algorithm in 2020. In order to emphasize the structure of analyzed social networks and communities detected by the best algorithm—GN in 2017 and 2020, it has been decided to visualize them using iGraph. Results of the visualization are presented in Fig. 1. Labels of each community presented on the networks correspond to labels presented in Table 2.
Fig. 1 Communities detected by GN algorithm in a 2017 and b 2020
Detecting Communities in Organizational Social Network . . .
23
5 Conclusions The paper focused on the social network based on e-mail communication within a public organization and aimed at identifying possible communities within that network. The organizational e-mail network has been observed in two periods (2017 and 2020) which allowed one to identify main features of the network in each period and to detect changes between them. The obtained results of the experiment have been also referred to real structure of the organization. The computational experiment allowed one to observe and conclude that: • The number of e-mails sent and received by employees within the whole organization increased by 36.5% percent in 2020 when compare to 2017, and the number of e-mails sent and received by employees within the observed organizational unit increased by 12% per cent in 2020 when compare to 2017. In the same time, the number of employees employed in the unit decreased by 15%. One of the hypothetical reason of observed changes in volume of messages exchanged between employees may be related to general changes observed in social and organizational life after COVID-19 pandemic started at the beginning of 2020 (adaptation to a new situation, remote or hybrid working, distance learning, etc.). • In 2019, two departments in the unit were merged into one larger department in the organization. This fact has been correctly recognized by most algorithms engaged in the experiment. The number of communities detected by most algorithms in both periods has been decreased by 1 (from 6 to 5). • The structure of communities detected by all algorithms was almost the same. To sum up, the results presented in the paper confirm that SNA applied to the organizational network based on e-mail communication may help to better understand the structure of the organization. The future work will aim at further investigation of the organizational network based on e-mail communication and detecting central nodes in the network. It will allow one to compare the results with positions of elements (referring to found nodes) in real structure of the organization.
References 1. Albert, R., Jeong, H., Barabasi, A.-L.: Diameter of the World-Wide Web. Nature 401, 130–131 (1999) 2. Bedi, P., Sharma, Ch.: Community detection in social networks. WIREs Data Mining Knowl. Discov. 6, 115–135 (2016) 3. Blondel, D.V., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech.: Theory Experiment 10, 10008 (2008) 4. Boccaletti, S., Ivanchenko, M., Latora, V., Pluchino, A., Rapisarda, A.: Detecting complex network modularity by dynamical clustering. Phys. Rev. E 75(4) (2007) 5. Christidis, P., Losada, A.G.: Email based institutional network analysis: applications and risks. Soc. Sci. 8(306) (2019)
24
D. Barbucha and P. Szyman
6. Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very large networks. Phys. Rev. E 70 (2004) 7. Coscia, M., Giannotti, F., Pedreschi, D.: A classification for community discovery methods in complex networks. Stat. Anal. Data Min. 4(5), 512–546 (2011) 8. Creamer, G., Rowe, R., Hershkop, S., Stolfo, S.J.: Segmentation and automated social hierarchy detection through email network analysis. In: Zhang, H., et al. (eds.) WebKDD/SNA-KDD 2007. LNCS, vol. 5439, pp. 40–58. Springer, Berlin, Heidelberg (2009) 9. Donath, W., Hoffman, A.: Lower bounds for the partitioning of graphs. IBM J. Res. Dev. 17(5), 420–425 (1973) 10. Fortunato, S., Hric, D.: Community detection in networks: a user guide. Phys. Rep. 659, 1–44 (2016) 11. Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. PNAS 99, 7821–7826 (2002) 12. Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. Springer, Berlin (2001) 13. Kolli, N., Narayanaswamy, B.: Analysis of e-mail communication using a social network framework for crisis detection in an organization. Proc.-Soc. Behav. Sci. 100, 57–67 (2013) 14. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J. (eds.) Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967) 15. Merten, F., Gloor, P.: Too much e-mail decreases job satisfaction. Proc. Soc. Behav. Sci. 2, 6457–65 (2010) 16. Michalski, R., Palus, S., Kazienko, P.: Matching organizational structure and social network extracted from email communication. In: Abramowicz, W. (eds.) Business Information Systems. BIS 2011. LNBIP, vol. 87, pp. 197–206. Springer, Berlin, Heidelberg (2011) 17. Newman, M.E.J.: The structure of scientific collaboration networks. PNAS 98(2), 404–409 (2001) 18. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), (2004) 19. Pastor-Satorras, R., Vespignani, A.: Epidemic spreading in scale-free networks. Phys. Rev. Lett. 86, 3200–3203 (2001) 20. Pons, P., Latapy, M.: Computing communities in large networks using random walks. J. Graph Algor. Appl. 10(2), 191–218 (2006) 21. Radicchi, F., Castellano, C., Cecconi, F., Loreto, V., Parisi, D.: Defining and identifying communities in networks. PNAS 101(9) (2004) 22. Ravasz, E., Somera, A., Mongru, D.A., Oltvai, Z.N., Barabasi, A.-L.: Hierarchical organization of modularity in metabolic networks. Science 297(5586), 1551–1555 (2002) 23. Reichardt, J., Bornholdt, S.: Statistical mechanics of community detection. Phys. Rev. E 74(1) (2006) 24. Szyman, P.: Community identification in social network based on e-mail communication using modularity. [in Polish: Identyfikacja spolecznosci w sieci spolecznosciowej opartej na komunikacji e-mail przy u˙zyciu modularnosci.] In: Antonowicz, P., Beben, R., Ploska, R. (eds.) Spoleczne i niematerialne determinanty rozwoju przedsiebiorstw, Wydawnictwo Uniwersytetu Gdanskiego, Gdansk, pp. 47–62 (2019) 25. Csárdi, G.: https://cran.r-project.org/web/packages/igraph/igraph.pdf. Accessed 04 Feb 2021
Impact of the Time Window Length on the Ship Trajectory Reconstruction Based on AIS Data Clustering Marta Mieczynska ´
and Ireneusz Czarnowski
Abstract Automatic identification system allows ships to automatically exchange information about themselves, mainly to avoid collisions between them. Its terrestrial segment, in which the communication is synchronized, has a range of only 74 km (40 nautical miles). To broaden this range, a satellite segment of the system was introduced. However, due to the fact that one satellite operates on multiple internally synchronized areas, the communication within satellite’s field of view suffers from message collision issue when two or more messages are received by satellite at the same time. This paper presents the first step of the approach of reconstructing lost or damaged AIS messages due to collision. The computational experiment was to cluster data (one cluster = messages from one ship) so that they can be further analysed to find abnormal ones among them and correct them. The clustering was conducted in a streaming approach, using a specific time window to mimic the timechanging character of AIS data. Numerical results that measure the clustering quality have been collected and are presented in the paper. The computational experiment proves that clustering might indeed help reconstruct AIS data and shows the best time window length from a clustering point of view. Keywords AIS · AIS data analysis · Clustering · Trajectory reconstruction
1 Introduction Automatic identification system (AIS) is a system that provides automatic broadcasting of information about a vessel to other vessels and shore-based stations, as well as to aircraft [1]. It is mainly used in vessel identification and collision detection, however, AIS data can also be utilized in many other activities, like vessels’ route M. Mieczy´nska (B) · I. Czarnowski Gdynia Maritime University, Morska 81/87, 81-225, Gdynia, Poland e-mail: [email protected] I. Czarnowski e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_3
25
26
M. Mieczy´nska and I. Czarnowski
planning [2] and seaport load optimization [3], detection of unnatural behaviour of the ship [4], or its location prediction [5]. For low-range purposes, so-called terrestrial AIS is used. Due to Earth’s curvature, the range of terrestrial AIS is about 74 km (40 nautical miles) from shore. To overcome such limitation, a satellite segment of AIS system (SAT-AIS) was implemented [6]. Thanks to the satellites’ altitude, the range of the system was greatly widened, however, its main drawback it that (opposed to terrestrial AIS), the transmission between a satellite and vessels from different areas still covered by that satellite is not synchronized. In other words, some messages from different vessels reach the satellite at the same time, thus, cannot be processed by it. This phenomena is called message collision. It causes the data collected by SAT-AIS to be partially damaged or lost. Therefore, systems using SAT-AIS data may produce false results or be unusable, if feeded with incorrect or incomplete data. The issue of overcoming message collision is still a lively area of research and a search for approaches dealing with this problem is still going on. Researchers try using different methods: based on statistics [7] or Viterbi algorithm [8]. An alternative approach is to use machine learning algorithms. There are several examples of an implementation of machine learning algorithms for vessel trajectory reconstruction or AIS data analysis. For example, in [9], clustering with feature extraction using convolutional auto-encoder is used. Paper [10] proposes event recognition from moving vessel trajectories (the proposed system that monitors activity of thousands of vessels can instantly recognize and classify events involving vessels). An approach proposed in [11] identifies the type of ship by the ship’s trajectory features obtained by radar when the ship does not send AIS data. An adaptive approach for marine lane extraction and refinement based on grid merging and filtering has been proposed in [12]. Some methods for missing data imputation using decision trees and fuzzy clustering (not directly AIS data) have been described in [13]. Although many of those research works try to predict the ships’ movements or trajectory, no one focuses on reconstructing AIS data itself. In this paper, the clustering approach to reconstruction of AIS data, when they consist of undefined or unrepresentable values, is discussed. Especially, the aim of the paper and the computational experiment was to present and validate the clusterbased approach for AIS data reconstruction, when the AIS data have been defined as streams. The main aim was to reconstruct the data through their analysis based on the defined time window—however, this paper focuses on the first stage, clustering, only. The computational experiment was to evaluate the impact on the size of the time window on quality of AIS data clustering. The paper is organized as follows. Section 2 contains problem formulation and discussion on the proposed approach. Section 3 provides details on the computational experiment setup and discusses experiment results. Conclusions and suggestions for future research are included in the final section.
Impact of the Time Window Length ..
27
2 An Approach to Ship Trajectory Reconstruction 2.1 Problem Formulation The AIS system produces streams of data. Each AIS datapoint (elementary AIS message transmitted by a vessel) includes a current vessel position, speed and course, navigational status, and other features (more about the AIS messages included in Sect. 3.2). Based on the AIS data, the trajectory of the vessel’s navigational operations can be drawn and analyzed, which provides opportunity to enhance the safety, efficiency, and security of marine transport systems [14]. In [15], a review of methods and tools for collecting, storing, distributing, as well as visualizing AIS data, with respect to different AIS-related applications, is included. A vessel’s trajectory can be defined as a finite sequence Ti = Tit1 , . . . Tit M , where i is the identifier of observed vessel in a particular area, t M is a time for which the trajectory is observed and Titm is a vector presenting a status of observed vessel i at time tm:m=1,...,M . Thus, the vector Titm represents a trajectory point (or data point) and can tm tm tm tm tm tm , where xi1 , xi2 , . . . xin , xi2 , . . . xin be expressed by the following vector: Titm = xi1 are the features derived from AIS message for vessel i at time tm and n is a number of features. Since N is the number of vessels for which the AIS data have been registered, T is a set of Ti:i=1,...,N and corresponds to a multidimensional vector of vessel’s trajectories. The message collision mentioned before may result in the data collected by SATAIS being partially damaged or lost, which can be expressed in the following way: ∃i ∃tm Titm , where Titm is undefined or unrepresentable (NaN). When AIS data are partially damaged or lost it is impossible to define a vessel’s trajectory as a finite sequence. Thus, the processes based on vessel’s trajectories analysis may not be carried out or such processes may not provide sufficient or reliable information for the superior system. Thus, in this paper, the vessel’s trajectory reconstruction is defined as a process of establishing or correction of the values of vectors Titm . In the next subsection, the problem of reconstructing lost or damaged AIS messages using a clustering approach is discussed.
2.2 The Proposed Approach The paper deals with the problem of AIS message reconstruction (used for plotting vessels’ trajectories and analysis of vessel navigation operations) using clustering techniques. Generally, the proposed approach is based on analysing registered AIS messages (as sets of packages), selecting those packets that seem to be somehow incorrect and, finally, predicting the possible true content of those. In the proposed approach, the vessel’s trajectory reconstruction process has been divided into several stages. In such approach, each stage can be solved using dedicated
28
M. Mieczy´nska and I. Czarnowski
procedures and algorithms. However, a general framework of the proposed approach includes: • At first, clustering of the dataset T (which includes registered AIS messages from a particular area and particular time frame), so that (ideally) messages from one ship are grouped into one cluster, thus, distinguished from messages from other vessels. • Next, separately for each cluster, analysing of messages to find abnormal or outlying ones. Those, together with messages from unnaturally small clusters, could be potential outliers—messages that might possibly be damaged and require correction. • At the last stage, reconstructing of messages including outliers based on the observed trends in correct data. This paper focuses on the first stage—clustering, carried out with respect to the time-changing aspect of AIS data.
2.3 Clustering and Time Frame Approach The aim of the clustering is to divide the dataset in a way that points belonging to one cluster are more similar to each other than to those assigned to any other cluster. Thus, in case of a considered problem of AIS data clustering, if the clustering is carried out in the right way (as an optimizing process with achieving an optimum of the objective function), all messages from each vessel are grouped into one (and only one) cluster and do not belong to another cluster. In general, clustering problems belong to NP-hard class [16], so the AIS data clustering may be time-consuming and may be at risk of finding the locally optimal solution, especially when the dataset is large. The idea of clustering of AIS data is not new in the literature. For example, it is used to extract vessels’ route patterns by grouping similar trajectories [9] or defining specific points of trajectories (e.g. where ships usually turn, speed up, etc.) called waypoints [17]. In this paper, we assume that the clustering does not have to be carried out on the entire dataset, i.e. based on all AIS messages coming from the vessel in the area operated by the AIS system. Since each vessel transmits a lot of AIS messages in a specific time regime, AIS dataset can be relatively big. On the other hand, the AIS data can be considered as a stream of data, flowing at specified intervals. What is more, by recording AIS messages from some specific area, it can be observed that some ships either appear within it or disappear. Some vessels may stay in the specific area for a shorter period of time, so the amount of this data will be different from that of another vessel. So, there are several reasons why AIS data can be characterized as stream data. Assuming the stream character of AIS data, the following questions have been formulated: do all messages collected from a particular vessel AIS are necessary to
Impact of the Time Window Length ..
29
reconstruct the lost ones? Is it necessary to remember all AIS messages or only more current parts of the data stream? To answer the questions, we assumed that relatively new (to some extent) packets carry enough information useful for vessels’ movement analysis and trajectory reconstruction when the data are undefined or unrepresentable. So, in the discussed approach to the clustering process, not entire dataset is used, but its part. The part of the AIS message used in clustering is specified by a so-called time frame (or time window). Such an approach should also be favourable from the computational point of view by eliminating the need for clustering of a big data set. It is worth mentioning that the whole reconstruction of AIS message should be carried out in a real-time system: any latency could potentially result in message not being processed and two ships colliding. Therefore, any methods for lowering the computational complexity of a system, such as proposed time frame approach, are welcomed. From the formal point of view, the time window is related to the value of M, where M is a time parameter that has been discussed in 2.1. M can be also called as a time window size or time frame length. Based on the above statements, an open question is the size of the time window. The answer to the question is formulated in the next section.
3 Computational Experiment 3.1 Aim of the Research This section contains results of the computational experiment carried out to evaluate the discussed approach, i.e. clustering of AIS data through the prism of the defined time windows. That is to say, a specific time window has been used to mimic the impact of the flow of time on the dataset—after a particular period of time, some messages either appear in the dataset or become out-of-date. The main research question was whether or not the proposed approach may help in the AIS message reconstruction as well as how the size of the time window influences the performance of the proposed approach.
3.2 AIS Dataset The dataset used in the experiment that represents vessels’ trajectories consists of AIS messages of three types: 1–3 called position reports (among all 27 message types in AIS). Each message of those types is 168-bit-long (excluding the CRC field) [18]. Its bits can be grouped and decoded from binary form to decimal form (some of them with a specific scale) to create the input for clustering algorithm, which finally includes values of the following AIS features: longitude, latitude, navigational
30
M. Mieczy´nska and I. Czarnowski
status, speed over ground, course over ground, true heading, special manoeuvre indicator, ship identifier (part of MMSI identifier) and country identifier (also part of MMSI) that gives 115 features in total. Fields that present identifiers were additionally encoded using one-hot encoding. All values have been also standardized to take values from −1 to 1. In this experiment, three different dataset have been utilized. The first one consists of 805 messages recorded from 22 vessels in the area of Gulf of Gdansk, the second: 19,999 messages from 387 ships around Baltic Sea, and the last one: 19,999 messages from 524 ships from Gibraltar.
3.3 Methodology The computational experiment was an iterative process, carried out in MATLAB environment. At the beginning, the first dataset was examined. The time-width of the dataset (i.e. the difference in time between the oldest and the newest message in a dataset) was computed. Then, the time window was applied—at first, 5-min-long: only messages that are no older than 5 min than the newest messages were taken into account, thus, creating a subset. A clustering algorithm was then run to divide the subset. In the computational experiment, two clustering algorithms have been used: • k-means [19]—for this algorithm the number of clusters has been set a priori and was equal to the number of vessels respectively to considered data set. As a similarity measure, the Euclidean distance has been used. • DBSCAN [20]—for this algorithm, it is not needed to set the number of clusters since it is established automatically during computations. However, for this algorithm, the minimum number of points considered to forming a cluster has been set to 1 (the parameter was set to 1 because it is not impossible that during a time window, there was a ship that sent only one message) and a neighbour searching radius has been set to 10 according to the results of tuning phase. Numerical results that have been recorded during the computational experiment are two coefficients that measure the quality of the clustering, including: • Silhouette [21] that shows, based on the average value, to which extent a single datapoint has been assigned to the right cluster. • Correctness coefficient (CC), authors’ original coefficient, created to distinguish the quality of assigning messages from one particular vessel (and only those) in one cluster, ranging from 0 (worst clustering) to 1 (best). It is a harmonic mean of two values: a weighted average of percentages of messages from each cluster that originates from only one modal vessel (with the volume of each cluster being the weight) and a weighted average of percentages of messages from each vessel that were assigned to only one modal cluster (with the number of messages originating from that ship being the weight).
Impact of the Time Window Length ..
31
If the results seemed numerically correct (i.e. were real numbers, not NaNs), they were saved and the counter which stores the number of slides that the time window made across the dataset was incremented. After that the time frame moved, covering the messages 5–10 min old, and the process of calculating the results of clustering of the newly formed data subset was repeated. This process lasted until the time window reached the end of the dataset, in other words, there were no messages older than the newest message that the time window could embrace. If so, the mean of gathered silhouette and correctness coefficient was calculated, the length of the time frame was increased and the act of sliding the time window across the dataset, clustering (then bigger) sub-datasets and collecting quality measures happened again. The time window lengths that were examined were 5-, 10-, 15-, 20-, 30-, 60-, 120-, 180-, 360and 720-min-long, respectively. After our first dataset was examined using the longest time window, the same experiment was conducted with artificially damaged data to see how the algorithm works when some of the messages are incorrect. 5, 10 and 20% of data points were randomly chosen, and 2 its bits (randomly chosen as well) were changed to opposite. Then another dataset was loaded and the whole process repeated. Finally, when testing all datasets finished, a weighted mean of clustering quality metrics from all datasets were calculated with the number of slides across each dataset being the weight.
3.4 Computational Experiment Results In Table 1, the mean values of the silhouette and the correctness coefficient obtained using the proposed approach are shown. To compare, these tables contain results for k-means and DBSCAN as well as for the different level of incorrectness within AIS messages. Impact of time window length on silhouette and correctness coefficient is also presented in Fig. 1. Plotted curves suggest that for k-means, if the dataset is small (short time frame), it is easier to correctly cluster all messages—both silhouette and correctness coefficient decreases with the increase of the time frame length. Also, better clustering results were achieved with the undamaged data. When it comes to clustering with a DBSCAN algorithm, the conclusions mentioned above relate only to silhouette (decrease with the increase of time frame length and number of damaged messages). However, it can be noticed that correctness coefficient (after initial drop) slowly rises with the increase of the time frame length. This phenomenon can be explained as follows—DBSCAN is able to conduct a more accurate analysis of the dataset when it is bigger or more complex. The analysis of the shapes of the plotted curves may lead to a conclusion that the relationship between clustering quality coefficients and time frame length is somehow logarithmic. In the next step of the experiment, the approximation of those relationships has been obtained. It could be sometimes helpful to estimate the error rate of the dataset knowing the obtained silhouette and CC. We tried to fit the polyno-
5 10 15 20 30 60 120 180 360 720 5 10 15 20 30 60 120 180 360 720
k-means
DBSACAN
Time window length (min)
Algorithm
0.9966 0.9902 0.9842 0.9788 0.9725 0.9648 0.9532 0.9442 0.9238 0.8925 0.9973 0.9932 0.991 0.9895 0.9869 0.9838 0.9795 0.974 0.9634 0.951
0.9996 0.9974 0.9943 0.9912 0.9883 0.9845 0.9782 0.9759 0.9673 0.9495 0.9992 0.9992 0.9992 0.9991 0.9989 0.9989 0.9986 0.9987 0.9992 0.9995
0.9932 0.9828 0.9753 0.971 0.9637 0.9552 0.9464 0.9338 0.9099 0.8835 0.9972 0.993 0.9907 0.989 0.9865 0.9834 0.9791 0.9738 0.9646 0.9568
0.9971 0.9929 0.9892 0.9871 0.9839 0.9806 0.976 0.9719 0.963 0.9431 0.9977 0.9961 0.9957 0.9955 0.9953 0.9953 0.9948 0.9948 0.9953 0.9958
CC
Silhouette
Silhouette
CC
5% damaged messages
No damaged messages
Table 1 Results obtained for the k-means algorithm and for different time window lengths
0.9882 0.9718 0.9625 0.9564 0.9473 0.9393 0.926 0.9118 0.88 0.8339 0.9972 0.9926 0.9902 0.9885 0.9859 0.9827 0.9782 0.9728 0.9635 0.9539
Silhouette 0.9937 0.9856 0.9819 0.9788 0.9759 0.9736 0.969 0.9628 0.9449 0.9198 0.9953 0.9918 0.9909 0.9905 0.9899 0.9899 0.9895 0.9893 0.9897 0.9906
CC
10% damaged messages
0.9817 0.9551 0.9429 0.9329 0.9206 0.9105 0.893 0.8737 0.8263 0.7874 0.997 0.9921 0.9895 0.9878 0.9851 0.9816 0.9761 0.9697 0.9553 0.9394
Silhouette
0.9895 0.9759 0.97 0.9664 0.9618 0.9605 0.9546 0.9471 0.9278 0.9032 0.9926 0.9855 0.9834 0.9824 0.981 0.9802 0.9795 0.9787 0.9788 0.9802
CC
20% damaged messages
32 M. Mieczy´nska and I. Czarnowski
Impact of the Time Window Length ..
33
Fig. 1 Impact of time window length on silhouette and correctness coefficient Table 2 Approximation functions for the relationship between clustering quality coefficients and time frame length with their corresponding fitting error, presented for different percentage-based levels of damaged messages (where x is a time window size) Algorithm Coefficient % Approximation (10−3 ) Error k-means
Silhouette
CC
DBSCAN Silhouette
CC
0 5 10 20 0 5 10 20 0 5 10 20 0 5 10 20
988.67 + 7.95 log(x) − 3, 315 log2 (x) 992.25 + 2.185 log(x) − 2.727 log2 (x) 975.09 + 9.459 log(x) − 4.468 log2 (x) 992.39 − 7.991 log(x) − 3.117 log2 (x) 999.45 + 1.631 log(x) − 1.274 log2 (x) 996.3 + 1.087 log(x) − 1.191 log2 (x) 976.98 + 10.908 log(x) − 2.875 log2 (x) 988.02 − 1.027 log(x) − 1.5291 log2 (x) 992.25 + 4.284 log(x) − 1.564 log2 (x) 997.51 + 0.580 log(x) − 1.019 log2 (x) 996.57 + 1.105 log(x) − 1.131 log2 (x) 987.77 + 7.264 log(x) − 2.17 log2 (x) 1000.42 − 0.765 log(x) + 0.092 log2 (x) 1000.98 − 2.65 log(x) + 0.285 log2 (x) 1001.57 − 5.247 log(x) + 0.55 log2 (x) 1005.69 − 10.745 log(x) + 1.056 log2 (x)
0.005017 0.005022 0.009912 0.009860 0.001908 0.002741 0.006711 0.006514 0.002173 0.001525 0.001679 0.002992 0.000222 0.000264 0.000590 0.001105
34
M. Mieczy´nska and I. Czarnowski
Fig. 2 Approximation of impact of time window length on the silhouette with k-means clustering (left) and CC with DBSCAN clustering (right)
mial curve: y = a + b · log(x) + c · log2 (x) (where x is a time frame length, y is a analysed relationship silh(x) or cc(x) for each algorithm and percentage of damaged messages), that is to say, find the values of a, b, c to best reflect the original curves. Obtained formulas with the corresponding fitting error and their plots are presented in Table 2 and in Fig. 2.
4 Conclusions The experiment proved that the clustering using machine learning techniques could be a first stage of reconstruction of incorrect AIS messages, either damaged or lost due to the message collision. Even when 20% of data was artificially damaged, the obtained clustering quality coefficients were promising (the lowest value of silhouette was 0.7874, and correctness coefficient—0.9032). Furthermore, the overall conclusion that can be drawn from trying different time window lengths while clustering was that time window length does have an impact of clustering results. From a strict clustering point of view, the shorter the time frame, the better clustering. However, a space for additional experiments appears here, since it should be examined how the time frame length impacts the second stage—anomaly detection (it possibly requires longer time window). Further research will probably answer the question of choosing the time frame length that satisfies the clustering— anomaly detection trade-off. Acknowledgements Our special thanks to Mr Marcin Waraksa and Prof. Jakub Montewka from Gdynia Maritime University for sharing the raw data that we used in our experiment.
Impact of the Time Window Length ..
35
References 1. AIS transponders. https://www.imo.org/en/OurWork/Safety/Pages/AIS.aspx 2. He, Y.K., Zhang, D., Zhang, J.F., Zhang, M.Y.: Ship route planning using historical trajectories derived from AIS data. TransNav 13(1), 69–76 (2019). https://doi.org/10.12716/1001.13.01. 06 3. Millefiori, L.M., Zissis, D., Cazzanti, L., Arcieri, G.: A Distributed approach to estimating sea port operational regions from lots of AIS data. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 1627–1632. IEEE Press, New York (2016). https://doi.org/10.1109/ BigData.2016.7840774 4. Lane, R.O., Nevell, D.A., Hayward, S.D., Beaney, T.W.: Maritime Anomaly Detection and Threat Assessment. In: 2010 13th Conference on IEEE Information Fusion (FUSION). IEEE Press, New York (2010). https://doi.org/10.1109/ICIF.2010.5711998 5. Liang, M., Liu, R.W., Zhong, Q., Liu, J., Zhang, J.: Neural network-based automatic reconstruction of missing vessel trajectory data. In: 2019 IEEE 4th International Conference on Big Data Analytics (ICBDA), pp.426–430. IEEE Press, New York (2019). https://doi.org/10.1109/ ICBDA.2019.8713215 6. Satellite — Automatic Identification System (SAT-AIS) Overview. https://artes.esa.int/sat-ais/ overview 7. Seta, T., Matsukura, H., Aratani, T., Tamura, K.: An estimation method of message receiving probability for a satellite automatic identification system using a binomial distribution model. Sci. J. Marit. Univ. Szczec. 46(118), 101–107 (2016). https://doi.org/10.17402/125 8. Prevost, R., Coulon, M., Bonacci, D., LeMaitre, J., Millerioux, J., Tourneret, J.: Extended constrained Viterbi algorithm for AIS signals received by satellite. In: 2012 IEEE First AESS European Conference on Satellite Telecommunications (ESTEL). IEEE Press, New York (2012). https://doi.org/10.1109/ESTEL.2012.6400111 9. Wang, T., Ye, C., Zhou, H., Ou, M., Cheng, B.: AIS ship trajectory clustering based on convolutional auto-encoder. In: Arai, K., Kapoor, S., Bhatia, R. (eds): Intelligent Systems and Applications. Proceedings of the 2020 Intelligent Systems Conference (IntelliSys), vol. 2, pp. 529–546. Springer, Cham (2020) 10. Patroumpas, K., Alevizos, E., Artikis, A., Vodas, M., Pelekis, N., Theodoridis, Y.: Online event recognition from moving vessel trajectories. Geoinformatica 21, 389–427 (2017). https://doi. org/10.1007/s10707-016-0266-x 11. Zhang, T., Zhao, S., Chen, J.: Research on ship classification based on trajectory association. In: Christos Douligeris, C., Karagiannis, D., Apostolou, D. (eds.) Knowledge Science, Engineering and Management. 12th International Conference, KSEM 2019 Athens, Greece, August 28–30, 2019 Proceedings, Part I, pp. 327–340. Springer, Cham (2019) 12. Wang, G., Meng, J., Li, Z., Hesenius, M., Ding, W., Han, Y., Gruhn, V.: Extraction and refinement of marine lanes from crowdsourced trajectory data. Netw. Appl. 25, 1392–1404 (2020). https://doi.org/10.1007/s11036-019-01454-w 13. Nikfalazar, S., Yeh, C.-H., Bedingfield, S., Hadi, A., Khorshidi, H.A.: Missing data imputation using decision trees and fuzzy clustering with iterative learning. Knowl. Inf. Syst. 62, 2419– 2437 (2020). https://doi.org/10.1007/s10115-019-01427-1 14. U.S. CMTS. Enhancing Accessibility and Usability of Automatic Identification Systems (AIS) Data: Across theFederal Government and for the Benefit of Public Stakeholders; U.S. Committee on the Marine TransportationSystem: Washington, DC, USA (2019) 15. Lee, E., Mokashi, A.J., Moon, S.Y., Kim, G.: The maturity of automatic identification systems (AIS) and its implications for innovation. J. Mar. Sci. Eng. 7(9), 279 (2019). https://doi.org/ 10.3390/jmse7090287 16. Hochbaum, D.: Algorithms and complexity of range clustering. Networks 73, 170–186 (2019) 17. Vespe, M., Visentini, I., Bryan, K., Braca, P.: Unsupervised learning of maritime traffic patterns for anomaly detection. In: DF&TT 2012: Algorithms Applications (2012) 18. Recommendation ITU-R M.1371-5. https://www.itu.int/dms_pubrec/itu-r/rec/m/R-REC-M. 1371-5-201402-I!!PDF-E.pdf
36
M. Mieczy´nska and I. Czarnowski
19. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol 1, pp. 281–297. University of California Press, Berkeley, California (1967) 20. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.). Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), pp. 226–231. AAAI Press (1996). 10.1.1.121.9220 21. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987). https://doi.org/10.1016/0377-0427(87)901257
Improved Genetic Algorithm for Electric Vehicle Charging Station Placement Mohamed Wajdi Ouertani, Ghaith Manita, and Ouajdi Korbaa
Abstract In this paper, we present a new approach to solve the Electric Vehicle (EV) Charging Station (CS) Placement problem. It aims to find the most suitable sites with the adequate type of CS. Using a meta-heuristic approach to tackle this issue seems appropriate since it is defined as an NP-hard problem. Therefore, we have introduced an improved genetic algorithm adapted to our problem by developing a new heuristic to generate initial solutions. Moreover, we have proposed modified crossover and mutation operators to enhance generated solutions. Throughout this work, we have aimed to minimize the total costs consisting of: travel, investment, and maintenance costs. However, we have to respect two major constraints: budget limitation and charging station capacity. The results provided by experimentation show that the proposed algorithm provides better results compared to the most efficient algorithms in the literature. Keywords Genetic algorithm · Charging station placement · Electric vehicles · Initialization heuristic · Parallel diversity
1 Introduction The electric vehicle remains a topical technology despite years of existence. It is inevitably the successor to combustion vehicles. Indeed, the number of EVs will increase from 5 million in 2018 to 41 million in 2040 [1] as a result of the climate measures and commitments signed by the member countries of the Paris Agreement. M. W. Ouertani (B) · G. Manita · O. Korbaa Laboratory MARS, LR17ES05, ISITCom, University of Sousse, Sousse, Tunisia M. W. Ouertani ENSI, University of Manouba, Manouba, Tunisia G. Manita ESEN, University of Manouba, Manouba, Tunisia O. Korbaa ISITCom, University of Sousse, Sousse, Tunisia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_4
37
38
M. W. Ouertani et al.
Many countries moved forward adopting this technology because of the benefits it offers. First of all, EVs are clean and ecological, the CSs we apply the Algorithm do not emit CO2 and therefore do not feed the phenomenon of global warming [2]. Moreover, EVs do not spread fine particles in space and promote the improvement of air quality in cities, which has a major impact on the health of residents and reduces the risk of catching respiratory diseases that are widespread in urban areas. Secondly, EVs are an economical mean of moving relative to the combustion vehicle because the energy consumed by the electric motor is much lower than that consumed in fuels for combustion engines, but also because of the simplicity of electric motors, which have fewer components and do not require regular maintenance. Finally, in order to promote this technology, many countries are encouraging the development of green electricity to power EVs, which reduces fossil fuel purchases for countries that do not have natural resources. In this paper, we will propose a solution that allows us to search for the best CS locations and types in order to optimize the distances traveled by EVs in search of a terminal and consequently save the energy spent and reduce waiting time. Therefore, we will introduce an improved genetic algorithm to optimize the sum of the travel costs of EVs, the investment cost, and the maintenance cost of CSs. The rest of this work is organized as follows: the first part is devoted to the state of the art where we will cite the main existing works in the literature, the second part is dedicated to the formulation of our problem, in the third part we will propose our approach for solving the problem of CSs placement followed by a section where we will present the results of the experiments and lastly we will finish with the conclusion.
2 Related Works The search for the best CS locations is a combinatorial problem where exact and approximate methods are applied to solve it. Meta-heuristics are the most used and defined as natural evolution techniques. They are inspired by the behaviors of living beings or simulate physical phenomena, easy to implement, and provide results close to the optimal. In this work, we decided to use meta-heuristics because it seems to be the best method to solve these problems. In the literature, most of the works used the genetic algorithm [3] and particle swarm optimization [4] to tackle this problem which, due to their effectiveness, provide satisfactory results within a limited time period. Hence, we provide the main works concerning CSs’ placements. In [5] the authors take as case study the city of Cologne. First, they divide the studied area into clusters, then assess the number of CSs needed to serve the EVs present in this area in peak hours by dividing the total demand by the minimum capacity of a CS, finally use an improved genetic algorithm called OLoCs to choose the best locations by manipulating an objective function based on the investment cost and transport cost. In [6] the authors use an adaptive particle swarm optimization algorithm by modifying the inertia factor to minimize the sum of annual average construction, annual average operating cost, and charging costs. In [7] the authors
Improved Genetic Algorithm for Electric Vehicle …
39
adopt the genetic algorithm to a model that aims to minimize the cost of building CSs and the total access cost due to the coverage of CSs that is lower than customer demand in China. In [8] the authors use a multi-objective approach to minimize two charges using an association between non-dominated sorting genetic algorithm-II (NSGA-II) with linear programming and neighborhood search. The first goal is to minimize total cost which includes construction and equipment costs and the second is to minimize the average waiting time to recharge the EV. In [9] the authors try to find the best locations for CSs in Beijing using a muti-objective approach. By using MO-PSO, the authors seek to minimize the total cost composed of capital expenditure and the charging cost on the one hand and maximization of coverage on the other. However, previous works do not take into account the types and the capacity of the CSs to be created. In order to solve this problem, our approach bases on a dynamic structure that allows us to generate various combinations of solutions with different number of CSs for each one. Moreover, it supports different types of CSs that can be installed.
3 Problem Formulation In this work, we aim to find the most suitable locations as well as the appropriate types of solution sets. To that end, we will use an objective function that aims to minimize the travel cost, installation cost, and maintenance cost. However, two constraints must be taken into account. First, we must take into account the capacity constraint because the number of EVs must not exceed the sum of the stations’ capacities. Then, the total cost of installing CSs must not exceed the allocated budget. Furthermore, we have to note that the investment and maintenance costs are closely connected with the types of CSs installed. As a result, our model will be built from the following equation: (1) MinTotalCost = (CostT r + Cost I n + Cost Ma ) it is imperative to note that there are two constraints to respect: 1. The budget constraint: the installation cost must not exceed a total allocated budget. nbcs Ci type j ≤ Budget (2) j=1
2. The capacity constraint: the number of EVs must not exceed the total capacity of the CSs. nbcs nbev ≤ Cap j (3) i=1
40
M. W. Ouertani et al.
Table 1 Notation overview (x, y) j Budget cap j type Cins j
jth CS’s coordinate Total budget allocated for the construction of CSs Capacity of the jth CS Installation cost of jth CS with type type
Cost I n Cost Ma CostT r discs j
Sum of installation cost Sum of maintenance cost Sum of travel cost Distance from jth CS to the nearest CS
disiev i j maxcs nj nbev nbcs type j uct ucm t ype j
Distance from ith EV to the nearest free CS ith EV, i ∈ 1, . . . , nbev jth CS, j ∈ 1, . . . , nbcs Maximum number of CSs Number of EVs supported by the jth CS Number of EVs Number of CSs jth CS’s type, type j ∈ 1, 2, 3 Unit cost of travelling Unit maintenance cost for the jth CS with type t
As previously mentioned, the different costs involved in the model are: the travel cost, the installation cost, and the maintenance cost. 1. The travel cost: it is the sum of the distances travelled by the EVs to the nearest free CS. So, it can be formulated as follow: CostT r =
nbev
disiev × uct
(4)
i=1
2. The installation cost: it is based on the type of CSs installed. We can have different capacities and charging times depending on the type of CSs. In total, we can identify four recharge modes. Those modes are: • Slow mode: it is the most widespread solution, the power is 3 kw (230V AC/ 16A) which allows a complete recharge between 6 and 8 h. • Normal mode: specific solution for condominiums. The charging power is 7 kw (230V AC/32 A, single-phase) or 11 kw (400V AC/16 A, three-phase), it allows the EV to be charged between 2 and 4 h. • Accelerated mode: this mode is reserved for drivers who travel long distances, the power is 22 kw (400V AC/ 32 A, three-phase), these stations are generally installed in public or hypermarket car parks and company garages. The charging time is between 1 and 2 h.
Improved Genetic Algorithm for Electric Vehicle …
41
• Fast mode: it is the solution for unexpected and long-distance travel. There are three standards for this mode: Combined Charging System (CCS) standard: charging power up to 170 kW (850 V DC/ 200 A, 350 kW from 2017), CHAdeMO standard: 50 kW charging power (500 V DC/ 100 A), and the last with 43 kW charging power (400V AC/63 A). Charging takes place between 20 and 30 min. We are interested in the next part of this work in the normal, accelerated, and fast mode which will be referred to as types 1, 2, and 3 respectively. Therefore, the installation costs denoted Cost I n is calculated as follows: Cost I n =
nbcs
type j
Cost I n
(5)
j=1
3. The maintenance cost: it depends on the number of EVs served by the concerned CS and for each type of CS a unit maintenance cost is used. The maintenance charges is: nbcs Cost Ma = n j × ucm t ype j (6) j=1
4 Proposed Approach : IGACSP In this paper, we adopt a meta-heuristic resolution technique to find the best CSs locations. Therefore, we chose to use the genetic algorithm known for its simplicity and efficiency to produce good results within a correct time frame to solve this problem. In this section, we present an enhanced version of the genetic algorithm named Improved Genetic Algorithms for Charging Station Placement (IGACSP) adapted to our problem. In fact, we propose an initialization heuristic, an improved crossover operator, and a suitable mutation operator to deal with the CSs’ placement problem.
4.1 Initialization The initialization phase for population-based meta-heuristics directly influences the result obtained. In fact, an initial population that lacks diversity causes premature convergence. For this reason, we propose an initialization heuristic based on parallel diversification (PDI). This heuristic gives an unbiased random distribution in the decision space. During initialization, each chromosome that belongs to the population contains the coordinates and types of CSs as shown in Fig. 1. Each chromosome contains a number of CSs whose number and capacity respect the constraints men-
42
M. W. Ouertani et al.
Fig. 1 Chromosome structure
tioned in Eqs. 1 and 2. Note that the chromosomes have dynamic sizes and depend on the number of CSs installed. Indeed, the values (0, 0, −1) will be assigned to the unused genes. In the initialization phase, we will try to randomly distribute the CSs set in space and we will place the CSs in such a way as to cover the entire geographical area. For that, we try to divide the studied area using a two steps procedure. First, we divide it into angular sectors number (na) starting from prefixed vertex denoted o and each sector is designed by Seci . Then, we divide it into rectangular parts number (nr) to obtain corridors of equal surfaces. The intersection of na sectors and nr parts generates subdivisions that constitute potential locations for CSs. It should be noted that each subdivision can contain only one CS. Let Geo be the rectangular geographical area to be studied and o as Geo symmetry center, to install randomly nbcs CSs we apply the Algorithm 1. Algorithm 1 Population 1: [na, nr, k] ← Find(nbcs ) 2: Divide Geo into na sector Seci of equal angles θ from o, θ ← 3: if k = 0 then 4: for i=1 to na do 5: Divide Seci into nr rectangles 6: end for 7: else 8: Rand Seci ← Random(Sec, k) 9: for i=1 to na do 10: if i = Rand Seci then 11: nri ← nr + 1 12: Divide Rand Seci i into nri rectangles 13: else 14: Divide Seci into nr rectangles 15: end if 16: end for 17: end if
2π na
and i ∈ {1 . . . na}
For the rectangle division, the concept is illustrated in Fig. 2. Let S1, S2, S3 be the areas of the respective surface rectangles a × b, c × d and e × f . Assuming that we must have equal corridors in terms of surface area we will have the following equations: S1 − S2 = S2 − S3 = S3
(7)
a − c = b − d and c − e = d − f
(8)
Improved Genetic Algorithm for Electric Vehicle …
43
Fig. 2 Rectangular division
Fig. 3 CSs initial positions
To conclude with an example of initialization we consider nbcs = 13 CSs to install the results are na = 3, nr = 4 and k = 1. Figure 3 gives an idea of the possible positions of the CSs modelled by black points. We can see that Sec1 and Sec2 contain 4 CSs, while Sec3 contains 5 CSs.
44
M. W. Ouertani et al.
Fig. 4 Maxdistance calculation
4.2 Fitness Evaluation in this work we used a normalized objective function in order to give the same weight to the different components of our function which are travel, installation, and the maintenance cost, this results the following equations: Nor Nor Nor MinCostNor T ot = (CostT r + Cost I n + Cost Ma )
(9)
nbev CostNor Tr
=
disiev − Mindistance Maxdistance − Mindistance i=1
(10)
where Maxdistance and Mindistance represent respectively the longest distance that an EV must travel to be served in the worst case and the shortest distance in the best case which is 0. For example in Fig. 4, d2 represents the maximum distance that the EV can travel in the worst case to reach a CS. nbcs CostNor In
j=1
=
(Budget −
nbcs CostNor Ma
=
type j
Cost I n
j=1
t ype
− Min(Cins j
))
type Min(Cins j ))
nb j × uctj − nbev × Min(ucm tj )
nbev × Max(ucm tj ) − nbev × Min(ucm tj )
(11)
(12)
where Min(ucm tj ), Max(ucm tj ) are the minimum and maximum unit maintenance costs, respectively.
Improved Genetic Algorithm for Electric Vehicle …
45
4.3 Crossover The purpose of this step is to generate two offspring ( f 1, f 2) from the alternate crossing of two-parent chromosomes ( p1, p2) selected by a Roulette wheel procedure. The main idea behind this operator, that each offspring is more related to one of the two parents. In other words, for each C Si in p1, we have to find the nearest C S j in p2 with the same type. Then, the new coordinates of this C Si are modified by the median value between C S j and C Si, if C S j exists. Otherwise, the coordinates of C Si in f 1 is the same as in p1. Algorithm 2 explains the work done by this operator. Algorithm 2 Crossover 1: [ p1, p2] ← Roulette wheel Selection Population 2: for C Si ∈ p1 do 3: ti ← t ypei 4: disics ← N ear estC S( p2, t ypei ) 5: if disics = 0 then f1 6: xC Si ← (xC Si + xC S j )/2 f1 7: yC Si ← (yC Si + yC S j )/2 f1 8: tC Si ← t ypei 9: else f1 10: xC Si ← xC Si f1 11: yC Si ← yC Si f1 12: tC Si ← t ypei 13: end if f1 14: f 1 ← C Si 15: end for 16: for C S j ∈ p2 do 17: t j ← t ype j 18: dis cs j ← N ear estC S( p1, t ype j ) 19: if dis cs j = 0 then f2
20:
xC S j ← (xC S j + xC Si )/2
21:
yC S j ← (yC S j + yC Si )/2
f2
f2
22: 23: 24:
tC S j ← t ype j else f2 xC S j ← xC S j
25:
yC S j ← yC S j
f2
f2
26: tC S j ← t ype j 27: end if f2 28: f 2 ← CSj 29: end forreturn f 1, f 2
46
M. W. Ouertani et al.
4.4 Mutation Mutation is a divergence operation that aims to escape from a local optimum and to discover better solutions. The proposed operator is a two steps procedure. First, we search to find better positions(coordinates) for placed CS. To do it, we start by identifying the serving station for each EV. Then, we generate clusters for each CS that determine the set of served EVs. After that, we change the coordinates of each station by the centroid of the related cluster. Secondly, we seek to change randomly the type of one selected CS in order to find better solution by increasing or decreasing the capacity of the proposed solution. However, some exceptions can be generated by exceeding the tolerated values for each type. therefore, if the type of the selected CS is higher than 3 then we have to randomly place a new CS type 1. On the other hand, if the type of the selected CS is lower than 1 then we have to remove this station. Algorithm 3 provides an explanation of the work done by this operator. Algorithm 3 Mutation 1: ch ← Random(Pop) 2: cluster s ← Clustering(E V, ch, N bcs ) 3: for C S j ∈ p2 do C Si (x, y) ← centr oid(Cluseteri ) 4: end for 5: C S ← Random(n j ) 6: if rand ≤ 0.5 then 7: if t ypech < 3 then 8: C S(t ype) ← C S(t ype) + 1 9: else 10: add N ewC S(ch, random(x), random(y), 1) 11: end if 12: else 13: if t ypech > 1 then 14: C S(t ype) ← C S(t ype) − 1 15: else 16: RemoveC S(ch, C S) 17: end if 18: end if
5 Experimental Results In this section, we present the results obtained during the experiments. Therefore, we will compare the performance of IGACSP compared to well-known and recent algorithms:
Improved Genetic Algorithm for Electric Vehicle …
47
• Genetic Algorithm (GA), [3]: Inspired by the theory of natural evolution, it is based on the selection of fittest individuals for the reproduction of offspring for the next generation. this algorithm is based on genetic operators such as crossover and mutation. • Particle Swarm Optimization (PSO), [4]: Optimization technique inspired by bird flocking or fish schooling. each particle adjusts its position based on its bestobtained position and the best-obtained position in the swarm. • Grey Wolf Algorithm (GWO), [10]: Imitates the management hierarchy and hunting mechanism of grey wolves. Four types of grey wolves are distinguished: alpha, beta, delta, and omega for the simulation the leadership hierarchy. To perform optimization, searching for prey, encircling prey, and attacking prey are implemented. • Whale Optimization Algorithm (WOA), [11]: Based on the humpback whale’s behavior and hunting method. These whales use a hunting technique called bubblenet feeding which is mathematically modeled for optimization process. • Atom search optimization (ASO), [12]: Inspired by physical phenomena that control the behavior of atoms influenced by interaction force and the geometric constrain. During the simulations, we generated 1000 problems, each one is evaluated during 100 repetitions to ensure the relevance of the results obtained. However, we defined two conditions for stopping the execution of heuristics, the first one consists of not exceeding Maxit = 1000 iterations and the second one consists of stopping the execution if we cumulate Noimp = 50 iterations without improvement. We have defined for each input the set of values it can take, the initialization of these parameters is as follows: • • • • • •
Budget ∈ [200, 1000] units nbev ∈ [100, 700] map size 500 ∗ 500 unit (x, y) ∈ [(0, 0); (500, 500)] Cap j ∈ ([5, 10, 30], [20, 40, 70]) t ype Cins j ∈ ([10, 15, 20], [20, 40, 60])
Through testing, the parameters adopted in the execution of different algorithms are tuned and presented in Table 2. Next, we will test the performance of our PDI, IGACSP, and present at the end a case study.
5.1 PDI Evaluation A first study consists in showing the impact of the PDI heuristic by comparing it to a random generation one as presented in Figs. 5 and 6 respectively. In this example, we have defined as input: Budget = 600, nbev = 350, Cap j = [10, 30, 50] and t ype Cins j = [10, 30, 50].
48
M. W. Ouertani et al.
Table 2 Parameters setting for different algorithms Algorithm Parameter IGACSP
GA
PSO
GWO
WOA
ASO
Population size Crossover type Mutation type Maxit Noimp Population size Crossover type Crossover rate Mutation rate Mutation type Selection strategy Maxit Noimp Particle number Inertia weight Cognitive constant C1 Social constant C2 Maxit Noimp Number of wolfs Maxit Noimp Number of search agents Maxit Noimp Number of atoms Depth weight Multiplier weight Maxit Noimp
Value 100 Based on Type of CS Based on capacity of CS 1000 50 100 One point crossover 0.8 0.1 Random Roulette wheel selection 1000 50 100 0.9 2.05 2.05 1000 50 100 1000 50 100 1000 50 100 50 0.2 1000 50
To prove the importance of the PDI without the use of crossover and mutation operators, we use the initial values of the fitness cost as a basis of comparison. For this example, the best initial solution founded using PDI (Fitness Cost = 1.44403) is better than the best one proposed using random initialization (Fitness Cost = 1.44558). Overall, after the execution of 1000 problems, it is clear that adopting PDI allows us to obtain better results than a random initialization according to the Figs. 7 and 8. In addition to that, the efficiency of the algorithm depends on the density, which is the number of EVs per unit of measure, in the studied area. In fact, due to the size of our map which is 500 ∗ 500 units, the ar ea is therefore equal to 25, 000 units 2 ,
Improved Genetic Algorithm for Electric Vehicle …
49
Fig. 5 Random CSs
Fig. 6 PDI heuristic
number of EVs then the minimum density is equal to: Minimum ar , and the maximum density ea Maximum number of EVs . Therefore, this density varies between 4.10−4 and 28.10−4 . is : ar ea PDI generates significantly improved results for a density higher than 2 × 10−3 .
50
M. W. Ouertani et al.
Fig. 7 Comparaison IGACSP\ IGACSP-random Init
Fig. 8 Comparaison GA-PDI\ GA
5.2 IGACSP Evaluation We interest in this section to evaluate IGACSP’s performance. We start by studying the execution time between different algorithms. Figure 9 shows that IGACSP is a serious competitor for PSO, reputed for its limited execution time, in fact in 21% of the problems, IGACSP managed to outperform PSO in terms of speed, while GWO, WOA, and ASO do not exceed 10%. To note that GA failed to distinguish itself at any of the 1000 problems. Figure 10 gives an idea of the average execution time for each algorithm.
Improved Genetic Algorithm for Electric Vehicle …
51
Fig. 9 Best execution time
Fig. 10 Average execution time
Figure 11 shows that in 44% of cases the improved version of the genetic algorithm (IGACSP) provides better results than other algorithms. To study the impact of our 2 operators, we made a comparison between IGACSP on the one side, and GA with PDI on the other side. Figure 12 shows the efficiency of the two enhanced operators, crossover and mutation, in IGACSP. In fact, in 74% of problems, IGACSP provides better results than GA with the PDI heuristic for initialization.
52
M. W. Ouertani et al.
Fig. 11 Best fitness
Fig. 12 Comparaison IGACSP\ GA-PDI
To demonstrate the efficiency of our technique, we randomly selected 20 problems. In the Table 3, we present the inputs of these problems, while in the Table 4 we present the obtained results. In fact, Table 4 presents the average and standard deviation of the results obtained by different algorithms for each problem. IGACSP reaches the best minimum in 9 of 20 problems. However, performance of IGACSP is good because it has the lowest standard deviation in 10 problems. Finally, after showing the superiority of IGACSP compared to other algorithms, we present a case study to examine more closely the presented results.
5.3 Case Study In this section, we present a case study based on the previously mentioned input t ype data which are: Budget = 600, nbev = 350, Cap j = [10, 30, 50] and Cins j = [10, 30, 50]. Figure 13 shows that at the first iteration IGACSP has a certain amount of advance in terms of Cost Fitness compared to other algorithms due to the adoption of PDI
Improved Genetic Algorithm for Electric Vehicle … Table 3 Input informations Problem nbev Budget 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
446 411 284 399 615 298 199 404 548 265 383 408 241 486 247 351 564 344 292 285
602 792 571 718 888 807 610 880 435 547 793 576 492 603 478 412 707 596 922 584
53
Cost[1]
Cost[2]
Cost[3]
Cap[1]
Cap[2]
Cap[3]
17 15 18 16 7 17 20 14 15 19 15 14 18 12 17 9 13 17 20 13
22 25 25 25 36 23 21 21 33 37 23 20 33 24 35 25 35 30 35 23
47 41 40 37 47 42 49 45 53 47 45 55 39 51 56 45 43 49 56 50
14 16 11 12 6 9 13 10 11 10 10 11 17 14 6 12 9 7 8 17
22 34 21 18 33 34 17 26 20 29 36 22 31 26 30 32 31 30 33 25
56 64 60 59 67 60 57 51 59 64 57 53 60 55 70 55 63 65 60 60
heuristic. Through new proposed crossover and mutation operators, IGACSP continues to obtain the best fitness costs over iterations. Moreover, the box diagram constructed from data obtained in 100 runs, as presented in Fig. 14, proves that IGACSP results are more than satisfactory compared to the other algorithms because the values obtained between the first and third quartiles are close to the minimum obtained solution over simulations with a low interquartile deviation. It is also noted that the median obtained in IGACSP is far the best among all algorithms. As mentioned before, Fig. 5 presents an example of the best initial solution with random initialization where 10 CSs type-1, 3 CSs type-2, and 1 CS type-3 have been randomly generated. After the execution of GA, we notice that in Fig. 15 the CSs type-3 have disappeared and the new solution is composed only by 9 CSs type-2 and 9 CS type-1. However, Fig. 6 presents a PDI for the same problem and the result of the application of IGACSP is shown in Fig. 16. In this case, all CSs type-3 are eliminated, and the final best solution is composed by only 11 CSs type-1 and 8 CSs type-2.
IGACSP
1.2491
1.2096
0.8146
1.1198
1.3339
0.9965
0.7897
1.1016
1.4303
0.8860
0.9471
0.9554
0.8427
1.4171
0.7772
1.4058
0.9537
1.1834
0.9584
0.8607
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Avg.
1
Problem
GA
PSO
3.0027E−03 1.2253 6.6436E−03 0.8168 8.6230E−03 1.1258 1.0773E−03 1.3559 5.6226E−03 0.9967 2.9835E−02 0.8028 4.8186E−03 1.1140 3.1148E−03 1.4430 4.7870E−03 0.8973 3.6201E−03 0.9621 4.9868E−03 0.9683 4.0209E−02 0.8516 2.2023E−02 1.4357 2.9533E−02 0.7837 5.9670E−03 1.4344 2.7647E−02 0.9592 3.3157E−03 1.1994 8.6853E−03 1.1625 1.3958E−02 0.8712
3.4901E−03 0.8173
4.2371E−03 1.1227
5.8828E−04 1.3459
8.5355E−03 .9894
2.8611E−02 0.7907
1.0002E−04 1.1050
2.9016E−03 1.4306
3.4480E−03 0.8887
4.6135E−03 0.9501
4.4185E−03 0.9596
3.1510E−04 0.8452
2.8069E−03 1.4102
1.8328E−02 0.7781
5.1178E−03 1.4097
1.1000E−03 0.9497
2.5001E−03 1.2049
5.6393E−03 0.9550
3.6000E−03 0.8656
Avg.
2.6428E−03 1.2125
Std. 2.9264E−03 1.2697
Avg.
2.6246E−03 1.2452
Std.
Table 4 Results of 20 selected problem GWA Avg.
7.2135E−03 0.8687
1.4487E−02 0.9558
6.2121E−03 1.1912
1.3091E−03 0.9565
7.5018E−03 1.4021
1.4906E−02 0.7810
4.1039E−03 1.4196
1.8021E−03 0.8445
3.9151E−03 0.9585
5.6835E−03 0.9455
5.3574E−03 0.8851
3.1176E−03 1.4305
2.5920E−03 1.1076
2.7076E−02 0.7903
1.7137E−02 0.9903
1.9380E−03 1.3399
1.8210E−02 1.1203
2.2003E−03 0.8163
5.6083E−03 1.2087
5.0331E−03 1.2628
Std.
WOA Avg.
1.3415
3.4418E−03 0.8682
5.8001E−03 0.9635
2.7240E−03 1.1892
8.0935E−04 0.9560
4.1013E−03 1.4111
1.3300E−02 0.7858
3.7000E−03 1.4245
8.1055E−04 0.8425
4.7553E−03 0.9621
4.9430E−03 0.9487
4.1001E−03 0.8914
1.9102E−03 1.4354
1.8200E−03 1.1102
2.6915E−02 0.7955
6.4020E−02 0.9921
8.000E−04
4.3584E−03 1.1206
2.6076E−03 0.8115
3.4028E−03 1.2170
2.2055E−03 1.2550
Std.
ASO Avg.
4.2012E−03 0.8717
6.8103E−03 0.9600
3.7004E−03 1.1852
1.1899E−03 0.9575
5.3132E−03 1.4086
1.8511E−02 0.7812
2.8739E−03 1.4188
6.0498E−04 0.8472
5.1089E−03 0.9606
3.8065E−03 0.9509
4.5188E−03 0.8895
3.3004E−03 1.4368
1.0601E−02 1.1040
2.3102E−02 0.7730
1.1005E−02 0.9934
7.8566E−04 1.3356
5.2044E−03 1.1257
3.8397E−03 0.8155
3.7433E−03 1.2157
4.5009E−03 1.2580
Std.
3.2121E−03
5.4589E−03
2.9002E−03
9.0168E−04
5.7002E−03
1.2901E−02
3.1136E−03
1.2350E−03
4.9562E−03
3.1030E−03
3.7003E−03
2.8006E−03
1.1001E−03
3.1638E−02
4.3874E−03
1.1018E−03
6.3406E−03
3.2231E−03
5.2067E−03
3.7343E−03
Std.
54 M. W. Ouertani et al.
Improved Genetic Algorithm for Electric Vehicle …
55
Fig. 13 Fitness cost
Fig. 14 Fitness cost
Finally, it should be noted that the solution generated by IGACSP has an excellent dispersion in the studied area compared to GA because the distance covered by EVs is the lowest. In addition to that, the solution proposed by IGACSP is the most economical because it maintains a reduced number of CSs in term of capacity and budget constraints. It is confirmed by the values obtained by the calculation of the normalized fitness costs which are 1.393226 for IGACSP and 1.401 for GA.
56
M. W. Ouertani et al.
Fig. 15 Optimized CSs—GA
Fig. 16 Optimized CSs—IGACSP
6 Conclusion In this work, we presented an improved version of GA to solve CS placement problem for EVs. The modification of some operators made it possible to adapt our heuristics to the problem of CS location. Moreover, PDI has proved its efficiency and has made it possible to obtain better results, especially beyond a certain density of EVs. Results of the experiments show that IGACSP delivers much better results than its competitors within a reasonable time. A better distribution of CSs saves costs
Improved Genetic Algorithm for Electric Vehicle …
57
and facilitates driver travel and consequently contributes to the well-being of the community. In future works, we will try to apply a multi-objective approach in order to have a global idea of the problem and participate in decision-making while taking into consideration the compromises to be faced.
References 1. MacDonald, J.: Electric vehicles to be 35% of global new car sales by 2040. Bloomberg New Energy Finan. 25 (2016) 2. Schmid, A.: An analysis of the environmental impact of electric vehicles. S&T’s Peer Peer 1(2), 2 (2017) 3. Holland, J.H.: Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence (1975) 4. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proceedings of IEEE International Conference on Neural Networks, volume 4, pp. 1942–1948. IEEE Computer Society (1995) 5. Mehar, S., Senouci, S.M.: An optimization location scheme for electric charging stations. In: IEEE SaCoNet, pages 1–5 (2013) 6. Liu, Z.-f., Zhang, W., Ji, X., Li, K.: Optimal planning of charging station for electric vehicle based on particle swarm optimization. In: Innovative Smart Grid Technologies-Asia (ISGT Asia), 2012 IEEE, pages 1–5. IEEE (2012) 7. Zhu, Z.-H., Gao, Z.-Y., Zheng, J.-F., Hao-Ming, D.: Charging station location problem of plug-in electric vehicles. J. Transp. Geogr. 52, 11–22 (2016) 8. Bai, X., Chin, K.-S., Zhou, Z.: A bi-objective model for location planning of electric vehicle charging stations with gps trajectory data. Comput. Indus. Eng. (2019) 9. Zhang, Y., Zhang, Q., Farnoosh, A., Chen, S., Li, Y.: Gis-based multi-objective particle swarm optimization of charging stations for electric vehicles. Energy 169, 844–853 (2019) 10. Mirjalili, S., Mirjalili, S.M., Lewis, A.: Grey wolf optimizer. Adv. Eng. Softw. 69, 46–61 (2014) 11. Mirjalili, S., Lewis, A.: The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016) 12. Zhao, W., Wang, L., Zhang, Z.: Atom search optimization and its application to solve a hydrogeologic parameter estimation problem. Knowl.-Based Syst. 163, 283–304 (2019)
Solving a Many-Objective Crop Rotation Problem with Evolutionary Algorithms Christian von Lücken , Angel Acosta , and Norma Rojas
Abstract Crop rotation consists of alternating the types of plants grown in the same place in a planned sequence to obtain improved profits and accomplish environmental outcomes. Determining optimal crop rotations is a relevant decision-making problem in agricultural farms. This work presents a seven objective crop rotation problem considering economic, social, and environmental factors and its solution using evolutionary algorithms; to this aim, an initialization procedure and genetic operators are proposed. Five multi- and many-objective evolutionary algorithms were implemented for a given problem instance, and their results were compared. The comparison shows the methods to be used as a tool for improving decision-making in crop rotations. Also, among the compared algorithms, the RVEA obtains the best values for evaluated metrics for the studied instance. Keywords Crop rotation problems · Evolutionary algorithms · Multiobjective optimization
1 Introduction The choice of crops and their land allocation is a central activity for any agricultural farm. A crop rotation plan describes the chronological sequence of different crops grown on a given land year after year. Nowadays, crop rotation achieves an increasing interest from the environmental perspective, and it is considered a sustainable solution for farms. Moreover, a well-planned growing sequence must lead to several agronomic and socioeconomic positive effects from a given period of crop growing to the next one, such as breaking the life cycle of pests, improving soil characteristics, reduction in the use of mineral fertilizers, erosion and nutrient leaching prevention, improving soil characteristics, reduction in the use of pesticides, biodi-
C. von Lücken (B) · A. Acosta · N. Rojas Universidad Nacional de Asunción, San Lorenzo, Paraguay e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_5
59
60
C. von Lücken et al.
versity enhancement, improved distribution of workload, increasing the crop yields, farm income, and margins [13, 15, 19]. Considering a number of alternative crops that can feasibly be grown on the land, the crop rotation problem tries to determine what crops the farmer should select, as well as when and where they should be planted to satisfy a given set of optimization goals [6]. Being a relevant and challenging problem, several researchers have proposed different methods to deal with crop rotations. In general, in the case of linear programming methods, they consider maximizing profit as the single objective to be optimized [12, 20]. However, besides profit, solving a crop rotation problem may require to consider criteria such as minimizing the distance of neighboring plots with the same botanical family for pest control [2], reducing variability in prices and yields [10] or minimizing the total space area required for covering seasonal crop demands [1], among others. Linear models considering multiple objectives, such as the presented in [3], typically optimize a weighted sum of objective functions for annual net profit and environmental outcomes or other criteria, subject to a set of constraints. The main issue with these formulations is how to normalize the different value types to combine them adequately; thus, as indicated in [9], multiobjective evolutionary algorithms surge as an alternative since they are best suited for expressing solutions in a multiobjective problem context [5, 13, 18]. In fact, in [18] multiobjective evolutionary algorithms (MOEAs) serve to solve a 5-objective optimization problem. In that work, SPEA2 (Strength Pareto Evolutionary Algorithms 2) [21] and NSGA II (Non-Dominated Sorting Genetic Algorithm II) [7] showed their ability to find valuable solutions in order to aid decision support for the considered problem. Several works explain the arising difficulties when traditional MOEAs deal with more than four objectives simultaneously, i.e., many-objective problems [17]. To deal with these difficulties, new algorithms such as NSGA3 (Reference Point-Based Non-dominated Sorting Genetic Algorithm III) [8], MOEA/D (Multiobjective Evolutionary Algorithm with Decomposition) [16], and RVEA (Reference Vector Guided Evolutionary Algorithm) [4] were developed. In recent work, however, it was also reported that in several cases, the performance of NSGA2 outperforms the newer algorithms with 8 and 10 objectives [14]. Consequently, it may become useful to compare different generation MOEAs when applied the technique to a new problem. This work presents a seven objectives crop rotation problem considering: (1) maximizing net profit, (2) minimizing investment cost, (3) minimizing the risk of investment, (4) maximizing the positive nutrient balance, (5) minimizing cultivation of same family crops in neighbor parcels, (6) minimizing the number of months for same family crops in neighbor parcels, (7) minimizing the number of months with lands fallow. Using a set of performance metrics, SPEA2 and NSGAII are compared against newer methods as NSGA3, RVEA, and MOEA/D, considering a problem instance with a set of 49 parcels and 20 crops. This paper’s organization is as follows: Sect. 2 describes the proposed crop rotation problem. Section 3 presents the evolutionary algorithms used in the analysis. Section 4 shows the results. The work ends with a brief conclusion.
Solving a Many-Objective Crop Rotation . . .
61
2 Model Description This paper considers a multiobjective crop rotation problem taking into account seven economic, agronomic, and environmental objectives. The following subsections introduce the representation of solutions for solving the problem using evolutionary algorithms and the objectives considered in this work, respectively.
2.1 Solution Representation A set of indexes [1, . . . , n] serves to identify the n crop options and access corresponding characteristics stored in tables and needed to evaluate the objectives. Table 1, for example, shows some of the time arrangement and length of each cropping cycle for the set of possible crops with the cycle length. Crops having two or more sowing seasons or usage are considered as different for each season. Besides crop information, it is necessary to store some characteristics of each of the m plots available for cultivation, such as their size, location, and soil analysis results. These parcels are also specified using indices [1, . . . , m]. Considering m plots, a vector P of size m represents a solution. Each Pi element of P is a list of indexes indicating the sequence of crops to be sown in the corresponding plot. Then, Pit ∈ [1 . . . n] represents one of the n possible crops to be grown at position t of the sequence, t ∈ [1, . . . , |Pi |]. As an example, Fig. 1 shows a possible cropping plan solution considering five plots and a planning horizon of 3 years.
Table 1 Cropping cycle for the set of possible crops with the cycle length Index Crop Cycle length Start 1 Soybeans 4 October
End November
2 .. .
Sunflower .. .
4 .. .
June .. .
October .. .
19 20
Forage sorghum Sunflower (as green manure)
8 2
October February
December April
Fig. 1 Solution representation for five plots and three years planning horizon
62
C. von Lücken et al.
2.2 Problem Objectives As previously stated, this work considers a seven objective crop rotation problem. The first four objectives are calculated as in [18]; thus, the detailed descriptions of these are not included here for space reasons. However, two different objectives address crop diversification in this work: maximizing crop diversification in consecutive seasons on the same plot and maximizing diversification in adjacent plots in the same season. Additionally, it is considered an objective to minimize the time in which the soils are fallow. P is calculated by adding up The total cost of investment for a given plan Ctotal P P the fixed cost Cfixed and the variable costs Cvar . The fixed costs of growing a given product j are those not influenced by the soils’ chemical characteristics or the predecessor crops, and their value depends only on the extension of the plot and fixed cost of the given crop. Variable costs depend on each crop’s needs and soil characteristics, such as fertilization costs and soil acidity correction. The sequence of previous crops influences these costs in the same plot because these affect the soil’s chemical characteristics. So, to obtain this value, the following data is needed: fixed costs and nutritional demands of each crop; soil treatment costs for crops according to soil characteristics per hectare; results of soil analysis and size of each parcel, information on nutrients absorbed and extracted by crops (to estimate soil conditions after a season). The return on investment and the economic risks are uncertain due to the uncertainty of the crops’ prices and the future yield of crops. In [18], a scenario generator with historical data provides a set S of price and crops yield scenarios to calculate the rotations’ expected income. Besides, the decision-maker may introduce the scenarios he is interested in. Thus, by evaluating the rotation strategy P over S its average net P profit N P S is calculated, a value that is expected to be maximized. Also, the standard deviation of the expected returns ρ(RSP ) serves as a measure of economic risk over the scenario set, which has to be minimized. Also, in [18] the sum of the differences between the levels of absorption and extraction of nutrients from each crop in the P . rotation schedule serves to evaluate the nutrients supplied to the soil Btotal Crop rotation practices do not recommend that the same botanical family’s crops coexist sequentially on the same plot or in adjacent plots simultaneously to avoid the spread of weeds and pests. Thus, the work considers a penalty function that counts the number of sequences with consecutive crops of the same family in the same plot, as follows: n−1 m pen(Pi, j , Pi, j+1 ) (1) QP = i=1 j=1
where pen(x, y) =
1 If the crops x and y belong to the same botanical family 0 in other case
Solving a Many-Objective Crop Rotation . . .
63
Another function counting the number of months in which there are crops of the same family in adjacent plots in the same period must be minimized and is defined by: m m−1 P M = a(Pi , P j ) (2) i=1 j=i+1
where a(x, y) =
0 If the parcels x and yare not adjacent p(x, y) If the parcels x and yare adjacent
also:
d
p(x, y) =
pen(C(x, month), C(y, month))
month=1
where C(Pi , month) is the crop in Pi for a given month and d is the planning scope. Finally, prolonged periods in which a plot is fallow are not recommended as this favors the growth of weeds and contributes to soil degradation due to erosion and other effects occurring when the soil is completely exposed. For this reason, we include among the crop options green manures or cover crops that serve to cover the ground, and we also add a penalty function, to be minimized, that adds the number of months in which nothing is grown in the plots. NP =
m d
V (C(Pi , month))
(3)
i=1 month=1
where V (x) =
2.2.1
0 If x = ∅ 1 If x = ∅
Problem Definition
Considering previous mentioned objectives, given a set of m plots and a set of crops that can be selected for the rotations, the many-objective crop rotation problem in this work search for the rotation strategies P = (P1 , P2 , ..., Pm ) that, while satisfying sowing season restrictions for each type of crop, in a given planning horizon optimizes: y = F(P) = ( f 1 (P), f 2 (P), f 3 (P), f 4 (P), f 5 (P), f 6 (P), f 7 (P))
(4)
64
• • • • • •
C. von Lücken et al. P
f 1 (P): maximizes the average net profit N P S P f 2 (P): minimizes the total cost of investment Ctotal P f 3 (P): minimizes the economic risk ρ(N P S ) P f 4 (P): maximizes the amount of nutrients supplied to the soil Btotal f 5 (P): minimizes the penalty for contiguous crops of the same family Q P f 6 (P): minimizes the number of months in which crops of the same family exist in adjacent plots in the same period of time M P • f 7 (P): minimizes the number of months the soil remains fallow N P .
3 Evolutionary Algorithms for Crop Rotation Planning An approximation of the solutions set can help the decision-maker selecting a given solution and gaining insight into the problem at hand before the preference definition. Multi- and many-objective evolutionary algorithms are useful solvers for multiobjective problems with up three or more objectives, respectively. Thus, this paper analyzes the behavior of five methods for solving the crop rotation problem described in the previous section. Two of these are traditional Pareto-based (SPEA2 and NSGA2), and the other three are newer algorithms based on decomposition (MOEA/D, RVEA, NSGA3). A brief description of algorithms is presented; however, the details will not be given here due to space constraints. SPEA2 uses a fine-grained fitness assignment strategy based on counting the number of solutions the considered solution dominates and a density measurement based on the distance to the kth nearest neighbor. The NSGA2 uses an elitist procedure in which parent and offspring populations join, and from the combined set, selection of the individuals for the next evolutionary population occurs. This selection occurs by classifying the joined population by non-dominance and, then, taking elements of each front from the better until the population reaches its size capacity. The last front’s elements in the next population are selected based on the average distance of its neighboring solutions, called the crowding distance. The RVEA [4] and the NSGA3 [8] are mainly based on the elitist procedure of the NSGA2. These algorithms mainly differ in the method used to determine the next population elements to be selected. NSGA3 determines the elements to include from the last front using a reference set and associating to the elements of the population their closest reference point. Meanwhile, RVEA determines the elements to include in the next evolutionary population by a reference vector guided selection procedure that analyzes both the convergence and diversity properties of each solution in the different populations. The MOEA/D [16] divides the original problem into a number of subproblems equals to the population size. Each population element has a unique weight vector corresponding to a subproblem that serves to evaluate its objective functions. A neighborhood of elements is defined regarding the Euclidean distance of weight vectors, and a recombination operator is applied considering only neighboring solutions.
Solving a Many-Objective Crop Rotation . . .
65
To implement the aforementioned algorithms in the proposed crop rotation problems, besides the objective function calculation, an initialization procedure, as well as a crossover and mutation operator, are proposed as follows • Initialization: The initialization process is carried out by randomly determining possible rotations for each plot. Each crop has a given sowing and harvest time. The first crop to include in a rotation plan is randomly selected among the crops that to sown in the month defined as the initial sowing month; then, the procedure randomly selects crops among the those that can be sown in the month after the previous crop ends until fulfilling the rotation length. • Crossover: The crossover operator receives two parent solutions containing the planned rotations for the m plots. Then, to maintain a minimum sequence of cultivation, the rotation plan for each parent is decomposed into a number of sublists with three elements. Then, an arrangement of 12 elements is created (one for each month), and the sublists that begin the sowing of their first crop in month i are loaded in each position i of the arrangement. Subsequently, a procedure similar to initialization is followed to generate a child rotation. The first sublist is selected from among the elements in the month defined as the initial planting month; then, the next sublists are randomly selected from those elements in months after the end of the harvest of the last crop of the previous sublist. If it arrives in a month in which there is no available sublist, an element representing the option of not cultivating is added, and the procedure continues with the next month, and so on until reaching a month in which there are available sublists or until the defined rotation duration is reached. The child solution is made up of the child rotations for each of the m plots. • Mutation: After receiving a population element with planned rotations, with a probability of p1 , the operator produces a random permutation of its plans exchanging rotations for a plot with another. Alternatively, with probability p2 , the procedure may randomly modify any of the m plots’ rotation plans by selecting a random subsequence of the rotation and modifying its value following a similar procedure as the initialization.
4 Computational Experiments The model and the optimization algorithms were developed within the MOEA Framework [11] using the Java language. The database was implemented in PostgreSQL 13. The soil dataset used in testing was the same as in [18]. In this work, a set of 20 crops, as indicated in Sect. 2.1, is used. Updated information about prices and yields of the crops used in this work are available upon request. Algorithms execute 30 times with randomly initialized populations with 300 individuals. The number of generations is 10,000; the crossover probability is 0.9, and the mutation probability is 0.2 for p1 and p2 . Maximization objectives were all transformed to their equivalent minimization counterpart to simplify implementation.
66
C. von Lücken et al.
4.1 Comparison Metrics The operators used for initialization, crossover, and mutation ensure the validity of crop rotations proposed by the MOEAs. Also, experts’ opinions and comparison with those obtained in [18] served to verify that the results being realistic; while inverted generational distance (IGD) and contribution metrics evaluate the algorithms’ performance regarding the extent of the provided Pareto optimal set, the closeness of the solutions to the best available Pareto front, and the solutions’ distribution. The inverted generational distance (IGD) is one of the most used metrics to evaluate the performance of EMO algorithms in evolutionary many-objective optimization studies. IGD measures the average distance from each point in a reference set (an approximated Pareto Front in this case) to its nearest element in the evaluated set. A small value of the IGD indicator indicates both a good convergence of the solution set and a good distribution over the entire Pareto Front. The contribution metric is the ratio of the number of elements in the evaluated solution set to the size of the approximated Pareto Front.
4.2 Experimental Results Metric values, for each of the considered algorithms, are plot in Figs. 2 and 3 every 500 generations up to generation 10,000. In each case, graphics report the average, the minimum, and the maximum value obtained by each algorithm on the proposed crop rotation problem over 30 independent runs. The considered approximation set contains 12,390 solutions. Figure 2 shows that at the initial iterations, the IGD values of NSGA2, NSGA3, and SPEA2 are better than RVEA and MOEA/D. However, as the number of iterations increases, in iteration 1500, RVEA overpass the other alternatives. In the following iterations, the other two newer algorithms surpass the traditional ones. In the case of MOEA/D, the algorithm shows a similar performance to RVEA. The ranking of the last generations’ metric results, from better to worse, is as follows: RVEA, MOEAD, NSGA3, SPEA2, and NSGA2. Figure 3 shows that for the contribution value, the RVEA significantly outperforms the other algorithms, followed by MOEA/D. The NSGA2 is the one that occupies the third position. The worst values for this metric at the end of the run are SPEA2 and NSGA3, ranking fourth and fifth, respectively. Analyzing solutions of NSGA3, we found that the proposed set of solutions are near the Pareto set, but any of them belong to the final approximation set. Whereas more variation exists for NSGA2; i.e., they propose solutions both in the final approximation set and solutions with larger distances to the reference set.
Solving a Many-Objective Crop Rotation . . .
Fig. 2 Inverted generational distance metric values
Fig. 3 Contribution metric values
67
68
C. von Lücken et al.
5 Conclusion This paper expands the work presented in [18] by incorporating three new objectives in the problem formulation, proposing new evolutionary operators, and evaluating newer algorithms while including more crops in the evaluated instance. The new objectives are to deal with sustainable practices of dealing with the spread of weeds and pests. The proposed methods for initializing population, crossover, and mutation operators are based on the crop cycles’ characteristics (sowing months and duration) to guarantee reliable solutions. Five MOEAs were implemented and compared for a given instance of the problem considering 20 crops and 49 parcels. By evaluating the algorithms’ performance every 500 iterations, it is possible to note that the ranks of some of the algorithms switch between the first and the last iterations. For the evaluated case and considering the last iteration results for metrics IGD and Contribution, when compared algorithms execute a large number of iterations, the best choice among them appears to be RVEA, followed by MOEA/D. Further research includes combining the different algorithms evaluated in this work and designing a specific problem decomposition strategy for the crop rotation problem to improve the obtained results and consider a larger number of parcels and other objectives. Acknowledgement This work was partially supported by CONACYT Project PINV18-949
References 1. Alfandari, L., Plateau, A., Schepler, X.: A branch-and-price-and-cut approach for sustainable crop rotation planning. Eur. J. Oper. Res. 241(3), 872–879 (2015) 2. Aliano Filho, A., de Oliveira Florentino, H., Pato, M.V., Poltroniere, S.C., da Silva Costa, J.F.: Exact and heuristic methods to solve a bi-objective problem of sustainable cultivation. Ann. Oper. Res. 1–30 (2019) 3. Annetts, J., Audsley, E.: Multiple objective linear programming for environmental farm planning. J. Oper. Res. Soc. 53(9), 933–943 (2002) 4. Cheng, R., Jin, Y., Olhofer, M., Sendhoff, B.: A reference vector guided evolutionary algorithm for many-objective optimization. IEEE Trans. Evol. Comput. 20(5), 773–791 (2016) 5. Chetty, S., Adewumi, A.O.: Comparison study of swarm intelligence techniques for the annual crop planning problem. IEEE Trans. Evol. Comput. 18(2), 258–268 (2014) 6. Clarke, H.R.: Combinatorial aspects of cropping pattern selection in agriculture. Eur. J. Oper. Res. 40(1), 70–77 (1989) 7. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6(2), 182–197 (2002) 8. Deb, K., Jain, H.: An evolutionary many-objective optimization algorithm using referencepoint-based nondominated sorting approach, Part I: solving problems with box constraints. IEEE Trans. Evol. Comput. 18, 577–601 (2014) 9. Dury, J., Schaller, N., Garcia, F., Reynaud, A., Bergez, J.E.: Models to support cropping plan and crop rotation decisions. A review. Agron. Sustain. Dev. 32(2), 567–580 (2012) 10. Filippi, C., Mansini, R., Stevanato, E.: Mixed integer linear programming models for optimal crop selection. Comput. Oper. Res. 81, 26–39 (2017) 11. Hadka, D.: MOEA framework: a free and open source Java framework for multiobjective optimization (2012)
Solving a Many-Objective Crop Rotation . . .
69
12. Haneveld, W.K., Stegeman, A.W.: Crop succession requirements in agricultural production planning. Eur. J. Oper. Res. 166(2), 406–429 (2005) 13. Herman, M.R., et al.: Optimization of bioenergy crop selection and placement based on a stream health indicator using an evolutionary algorithm. J. Environ. Manage. 181, 413–424 (2016) 14. Ishibuchi, H., Matsumoto, T., Masuyama, N., Nojima, Y.: Many-objective problems are not always difficult for pareto dominance-based evolutionary algorithms. In: Proceedings of the 24th European Conference on Artificial Intelligence (ECAI), pp. 291–298 (2020) 15. Kollas, C., et al.: Crop rotation modelling—A European model intercomparison. Eur. Jo. Agron. 70, 98–111 (2015) 16. Li, H., Zhang, Q.: Multiobjective optimization problems with complicated pareto sets, MOEA/D and NSGA-II. IEEE Trans. Evol. Comput. 13(2), 284–302 (2008) 17. Lücken von, C., Barán, B., Brizuela, C.: A survey on multi-objective evolutionary algorithms for many-objective problems. Comput. Optim. Appl. 58(3), 707–756 (2014) 18. Pavón, R., Brunelli, R., von Lücken, C.: Determining optimal crop rotations by using multiobjective evolutionary algorithms. In: International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, pp. 147–154. Springer (2009) 19. Reddy, P.P.: Sustainable Crop Protection Under Protected Cultivation. Springer (2016) 20. You, P.S., Hsieh, Y.C.: A computational approach for crop production of organic vegetables. Comput. Electron. Agric. 134, 33–42 (2017) 21. Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: Improving the strength Pareto evolutionary algorithm. Technical Report 103. Swiss Federal Institute of Technology, Zurich (2001)
The Utility of Neural Model in Predicting Tax Avoidance Behavior Coita Ioana-Florina
and Codru¸ta Mare
Abstract The phenomenon of tax evasion and phenomenon of tax avoidance are two facets of similar behavior. Both affect societies across globe. It is a well-known fact that tax revenues account for the majority of public revenues and therefore the phenomenon is of great interest for public authorities as it is for other stakeholders. Designing effective tax policies aiming at maximizing public revenues can only be done through a deep and complete understanding of taxpayer behavior and its internal motivations. The present study aims at developing a model for detecting risk of tax fraud in taxpayer behavior by trying to predict propensity of individual showing intention for evading taxes. The purpose of this study is to model fiscal behavior through artificial intelligence, using MLP network model that was trained and tested on a real data set comprising behavioral or biometric data. Results were compared to that of a binary logistic regression. The empirical model tested was based on qualitative attributes, which relate to behavioral elements such as the attitudes and perceptions of taxpayers and also internal motivations, morality or personal values. The prediction efficiency of the MLP model relies on 70% which shows similar performance to comparable research. Keywords Tax compliance · Data mining · Neural networks
Abbreviations MLP AI SVM PSO
Multilevel perceptron neural network Artificial intelligence Support vector machine Particle swarm optimization
C. Ioana-Florina (B) Department of Finance and Accounting, Faculty of Economics, University of Oradea, Oradea, Romania C. Mare Department of Statistics-Forecasting-Mathematics, Faculty of Economics and Business Administration, University of Babe¸s-Bolyai, Cluj-Napoca, Romania © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_6
71
72
ROC AUROC
C. Ioana-Florina and C. Mare
Receiver operating characteristic curve Area under receiver operating characteristic curve
Variables P Xj
Is the probability for an individual to trust the fiscal system Are the factors considered to influence this probability
1 Introduction Tax avoidance and tax evasion are two facets of a subject that has complex implications in tax litigation as well as in economic and social field. State exercises its coercive force through institutions empowered to collect public funds and is supported by the judicial bodies when identifying non-compliance behavior that falls under criminal law. In this sense, investigation of tax evasion is a complex process, which requires knowledge from various fields such as legal, economic or sociopsychological. Public fiscal measures for preventing this type of violation of legal norms could be based on identifying the most appropriate variables in the social and economic context that trigger the desired response in the behavior of the taxpayer. This in turn could help decision-makers build more efficient fiscal policies. In the context of current technological development, preventing and combating fraudulent tax behavior could be more efficiently tackled with the help of AI algorithms. Preventing and combating the phenomenon of tax evasion is a present concern of the national governments both because of the magnitude that this phenomenon presents and because of the increasingly sophisticated techniques used by the authors in carrying out tax frauds. The evolution of the tax evasion phenomenon at international level has acquired a profound technological character due to the increasingly elaborate methods, which involve the state-of-the-art technologies for collecting and manipulating the information present in the virtual environment used by criminals in developing increasingly complex system fraud mechanism tax. On the other hand, national governments and implicitly the investigative bodies must keep up with this complexity of the evasion phenomenon in order to exercise control over it and more in preventing and combating fraud. The hypothesis of current research takes into consideration the fact that there should be an explanation of the fact that in some countries around the globe tax evasion is at a low level, while in others, it is a wide spread phenomenon across society as a whole. Going further with the reasoning, findings show that in countries with low rates of tax evasion, people trust public authority. This study aims at identifying those variables that impact taxpayers’ decision whether or not to evade or avoid taxes based on its trust in state.
The Utility of Neural Model in Predicting Tax Avoidance Behavior
73
Structure of the article consists of an introductory part that presents a short overview on current research followed by theoretical framework section that details the existing background on the subject and identifies the gap where current research could fit. The second one describes methodological approach used in the study, third part presents results obtained, and fourth part describes a short discussion of results. Conclusions present the most relevant findings and future research possibilities.
2 Theoretical Framework Behavioral models were used in the study of tax compliance since the 1970s [1], and the following studies have brought new elements from sociology, psychology, legal studies, finance, agent-based models, AI models or data mining tools. The first behavioral model of tax compliance referred to the taxpayer’s decision regarding the level of revenue he could declare to the fiscal authorities, considering the possibility of being discovered and fined [1]. There are a large number of studies that take into consideration the internal motivation of people to pay taxes and that could be based on trust in the states’ policies and services [2]. On the other hand, there has been revealed a negative factor that impairs motivation to pay taxes and that is generated by the fear of punishment along with a low trust in public authorities’ services [3, 4]. Legitimacy of tax authority and a fiscal legislation based on fair principles and also welfare oriented could support tax-compliant taxpayer behavior [3, 5, 6]. Tax morale is another important aspect as it is defined as an obligation to pay taxes in order to build up to the welfare of society as a whole and it is an internal motivation [7–9]. Tax behavior forecasting is a difficult process due to the nonlinear nature existing in data collected from real-life situations. Reaching this purpose needs supporting tools that process nonlinear relationships and therefore AI models constructed by SVM using PSO showed higher predictive power in forecasting taxpayer behavior, based on public financial data [10–14]. In a simulated environment, one study trying to predict taxpayer behavior by making use of multi-agent systems and Markov reinforced learning networks showed that individuals’ behavior regarding paying taxes is influenced by external and internal factors [15, 16]. In another study, hybrid data mining methods were being used for detecting fraudulent behavior from analyzing qualitative and quantitative data on psychological and material traits of people [9, 17]. In the last twenty years, AI networks have been widely applied in various business and financial fields, having thus extensive applicability in studying and predicting financial default rates, bankruptcy risk, stock price forecasting or serving as decision tools [18–20]. The multilayer perceptron algorithm is part of the family of feedforward networks and outperforms other techniques in the field of tax fraud, proving to be an appropriate neural model chosen for testing the model on the collected data in our study [21].
74
C. Ioana-Florina and C. Mare
3 Methodology A MLP neural network model, a logistic regression and a neural model in R along with other machine learning algorithms were applied on the database in order to compare their predictive power on the biometric data used. A MLP model and binary logistic were tested using IBM SPSS Statistics Premium Academic Pack V 26–2020, the supervised neural network was tested using main package neuralnet from RStudio and R version 3.6.1., and the other machine learning models were tested using the libraries integrated in Visual Studio 2019 Pro. The data matrix is made up of 10,400 instances, and the 19 columns represent dependent variables that 520 respondents were asked to assign a value ranging from 1 to 5, where 1 is less important and 5 is very important. The 20th column represented the independent variable that comprised YES_NO answers of the 520 persons to the question: “Do you trust the current fiscal system?” Respondents were aged ranging between 22 and 26 years old, coming from various cultural backgrounds. Data was collected through an online survey. In respect to the age range, this was chosen due to the fact it is the period when a person is being educated and public measures could have a greater impact on longer term. Data matrix values were normalized, and no data was missing. The data matrix was formed by asking respondents to assign a certain value from a scale from 1 to 5 to various affirmations regarding the fiscal system. People were asked to assign a value to each of the dependent variables that were considered to make up an efficient fiscal system, having to choose from: transparency in public spending (TRANSP), value for money in public services (VAL_M), economic efficiency in public spending (ECON), trust in state and public authorities (TRUST), social welfare creation (SOC_WF), sustaining work place creation (WORK_P), socio-economic development (SOC_EC), legal and political stability (STABIL) and legal predictability (PREDICT). They were also asked to assign a value to each of the variables that were considered to encourage tax compliance behavior, having to choose from the following: tax inspections at shorter intervals (F_INSP), higher fines for non-payment of tax obligations (FINES), clearer laws for taxpayers (predictability) (CLEAR), fiscal facilities for paying taxes in advance (FACIL_F), tax subsidies for employers who create jobs (SUBV), tax deductions for company investments (T_INV), providing good quality public services (SVP_Q), building trust between citizen and state (TRUST_C), transparency on public money is spent (SPEND_B) and realization of public investment for socio-economic development (P_INV). Control variables were included. At last, they were asked to answer with YES or NO related to whether they trust or not the fiscal system they operate in. This last variable was modeled as the categorical-dependent variable in the database. The input and output units in the MLP neural network are generated in the network based on the dimensions of the data set, while the hidden units were being 6 for the 1 hidden layer of the input and of 2 units for the output. The activation function used was hyperbolic tangent for the units in the hidden layer and softmax function for the units in the output layer [22, 23].
The Utility of Neural Model in Predicting Tax Avoidance Behavior
75
The second model tested was the binary logistic, which attempts to construct a score function and determine the probability (odds ratio) for a person to avoid paying taxes and commit fraud. The two methodologies were tested using the same database. In the case of the binary logistic model, the following equation is estimated: log
P =α+ βj X j + ε (1 − P) j
(1)
where P is the probability for an individual to trust the fiscal system and X j are the factors or variables considered to influence this probability. The model was constructed in the forward type, in three stages. All the validation procedures were run, pre- and post-estimation. To evaluate the performance of the binary logistic regression, we have constructed the ROC curve based on the predicted probabilities in the model. For comparison purposes, we also estimated a supervised neural network using main package neuralnet (Training of Neural Networks)—version 1.44.2 (February 7, 2019), using RStudio and R version 3.6.1. The parameters used were: 10 hidden layers, 0.1 threshold, 105 Rmax, 3 repetitions and 0.65 rounding. We tested several scenarios, and the most relevant results are presented. At last, we tested several machine learning models using Visual Studio 2019 Pro libraries, model builder ML.NET. The data was split between training and testing as 80 to 20%. The most performant models were detailed.
4 Results The purpose of testing several neural network models, machine learning and binary logistic models was to predict the propensity of any individual showing intention for evading taxes based on their trust in the state authority. Intention lying behind fraudulent tax behavior is intrinsic related to trust in authorities and tax morale [2, 5]. The neural models are used as dependent variable, a dichotomous one taking the value of 0 if tested agent shows intention to commit fraud due to lack of trust in public authority and value 1 if agent shows trust in state, so therefore no fraud intention is detected. All the dependent variables are perceived through the “eyes of taxpayer,” so therefore perception and trust are internally connected into the human behavior [8]. Moreover, trust is a feeling that shows how an individual perceives objective reality and is a good predictor of an existing latent intention of tax fraud. Existing research shows that where there is a high level of trust in authorities correlated with a positive perception of public services there are low levels of non-compliance behavior in taxation field [2, 5, 8]. The MLP uses random number generation for initialization of synaptic weights in order for the perceptron to learn a randomly selected training sample, and simulated annealing algorithm was used in weight initialization [24]. A random sample is taken
76
C. Ioana-Florina and C. Mare
from the entire data set and split into training (66.3%) and testing samples (33.7%) with the holdout partition being assigned by default as the training, the testing and validation steps [13]. The neural network was trained using the training data in order to adjust the parameters of the neural network, and the validation set was used to know when to stop training and the test set for testing the trained neural network for measuring the result steps [13]. The neural network was built using batch training, and scaled conjugate gradient was used as the optimization algorithm. There were no missing values in the input or output variables. It can be seen that the model can be evaluated as being correctly classified due to the fact that its highest predicted probabilities are approximately close to the observed categories for each particular case. There was performed a sensitivity analysis, as shown in Fig. 1, that measured the importance of each predictor in determining the results of the tested neural network. As the results show, the variables CLEAR, FINES and SPEND_B followed by SOC_EC and PREDICT prove to have greater impact in the prediction realized. This implies that transparency in public spending and clearer tax law followed by legal predictability and socio-economic development are independent variables that non-compliant behavior is the most sensitive. These results show that public authorities could pay greater attention in controlling these factors in order to prevent fraudulent behavior and build taxpayers’ trust in public authorities. ROC curve displays the model’s performance measured through the area under curve; the higher the area, the greater the prediction power of the model. In our case, the area under the curve for the dependent variable is 0.691. Simulations were conducted with different clustering algorithms that lead to models with different performance levels. Taking into account results obtained, variables such as clarity in
Fig. 1 Neural network model tested in R of behavioral model
The Utility of Neural Model in Predicting Tax Avoidance Behavior
77
fiscal legislation (CLEAR), risk of receiving a fine (FINES), efficient public spending of tax collected money (SPEND_B) and predictable tax law (PREDICT) could significantly impact taxpayer’s behavior. These results could offer tax authorities new insights into taxpayer behavior for taking better decisions regarding effective action plans for combating non-compliance behavior. Variables such as efficient economic measures (ECON) and realization of public investment for socio-economic development (P_INV) are the most sensitive to influencing probability to trust the fiscal system. Results show that if these variables are not properly influenced by public measures this can in turn generate lack of trust. The factor referring to tax inspections at shorter intervals (F_INSP) has a strong positive effect upon trust in the fiscal system. The higher the perceived importance of the fiscal inspections at shorter intervals, the higher the probability for an individual to trust the fiscal system. The probability would increase 1.261 times from group to group. Results obtained through running a binary logistic regression show some constraints on applying it to behavioral data and prove once again that MLP neural network model shows more accurate results when correlating statistics with reality of economic context analyzed. Performance of the binary logistic model was evaluated using the ROC curve attached to the model, based on the predicted probabilities. For this, we have obtained a significant area under the curve (p-value = 0.000) of 0.592. Both the significance value and the AUROC confirm that the model is a good estimator of the probability to trust the fiscal system. However, the AUROC is just a little bit above the 0.5 threshold, value that shows that significant improvement can be brought to the model. We estimated a supervised neural network using main package neuralnet (Training of Neural Networks)—version 1.44.2 (February 7, 2019), using RStudio and R version 3.6.1. Results of neural network performances, as shown in Fig. 1, reveal comparable results with MLP model and reveal similar results with that of other similar studies [25]. At last, we performed several tests on various machine learning models integrated in the Visual Studio 2019 Pro libraries, model builder ML.NET. Results showed that value prediction models, such as SdcaRegression, FastTreeRegression, FastForestRegression, OlsRegression and LbfgsPoissonRegression, show negative results in testing the model proving that these are not suitable for this type of biometric data. Machine learning models used for text classification, such as AveragedPerceptronOva, LightGbmMulti, SdcaMaximumEntropyMulti and SymbolicSgdLogisticRegressionOva, have performances of 50% that is lower than that of a MLP neural model. The 70% efficacy of the MLP model selected is greater than that of the binary logistic regression model of 60%. The sensitivity analysis conducted using ROC curve shows the greater capacity of the MLP neural network model to better predict human behavior from this respect. The MLP neural network model does better in discriminating between fraudulent intention of taxpayers and the absence of it. Performance of the MLP neural model is evaluated in the context of the type of analyzed data. Current research analyzes behavioral or biometric data referring to unique and individual characteristics of people taking the online survey. Behavioral data pertains complex features that make it difficult to analyze, measure and predict. In this sense,
78
C. Ioana-Florina and C. Mare
the MLP model showed valuable performance in order to be taken into account for further research.
5 Discussion of Results Results of the tested neural models show that MLP model has an accuracy rate of 70% proving an optimal performance compared to results from other studies [17, 21, 25]. Findings of current research are in accordance with the scientific literature showing that neural network models usually have better predictive accuracy over linear or nonlinear models due to the fact that they provide more accurate estimations when tested on behavioral data [25]. This reason lies in the fact that neural networks use classification of each individual whether trusting or not the fiscal system, showing intention to tax fraud or not. Comparing our results with the ones of another similar study [25] revealed that MLP model employed for estimation propensity of an individual to evade taxes has proved to have similar performance. Researchers [25] used different data for analysis, meaning tax forms filled by individuals on their personal income tax. They evaluated what their tax forms comprised and what they actually should declare so the mechanism for detecting fraud was fairly straightforward. Which person evaded taxes, this would have been revealed by the differences between declared and estimated by the model. There are a few risks in the mentioned research and in any of this type, and this relies in identifying and mitigating the false-positive or the true-negative rates. In this sense, our research focus is on predicting trust in public authority from a set of preferences revealed by the respondent. Our model predicts which taxpayer having a set of personal beliefs and preferences would trust or not the fiscal system in a country. Based on one’s personal emotions like trust, public authorities could predict which taxpayer is inclined to commit tax avoidance or tax evasion. For more accurate predictions in the future, in order to eliminate false-positive rates, we will apply the survey on people who we already know for certain that they committed a form of fraudulent behavior. The survey collected the answers of people whom we do not have proof that they committed a certain form tax avoidance or tax evasion, although they could have but we do not know it. On the other hand, the binary logistic regression conducted emphasizes the fact that probability to trust the fiscal system is significantly conditioned by economic measures taken by the state to ensure welfare and work-places creation (ECON), tax inspections at shorter intervals (FINSP) and realization of public investment for socio-economic development (PINV). Trust in public authority is built when taxpayers receive value for money, meaning good quality public services and economic leverage for the taxes they pay. Also, ensuring fairness in taxation policy and punishing fraudulent behavior protects social values building up trust and discouraging non-compliance behavior. Results of applied MLP model to behavioral data revealed interesting aspects of taxpayers’ behavior toward paying taxes. Current research analyzes behavioral data
The Utility of Neural Model in Predicting Tax Avoidance Behavior
79
referring to individual characteristics of people such as attitudes, perceptions, beliefs or moral values. Behavioral data pertains complex features that make it difficult to analyze, measure and predict. In this sense, the MLP model showed valuable performance in order to be taken into account for further research. Results presented in this study reflect a variety of possibilities to improve tax-compliant behavior based on the predictive tools showed in this paper.
6 Conclusions Current research tries to take a snapshot over the complex interaction of social, financial, psychological, legal aspects of taxpayer behavior in order to be able to portray the ideal state of affairs that sustain a tax compliance behavior. The hypothesis of current research, referred to the fact that there should be some external or internal factors, that in some countries keep tax evasion at a low level. Results show certain common patterns that drive tax compliance, and these could be tackled by authorities in order to control tax evasion phenomenon. This study aimed at identifying those variables that impact taxpayers’ decision whether or not to evade or avoid taxes and which could be tackled and modeled accordingly in order for tax authorities to obtain a diminishing effect. Tax avoidance and tax evasion cost not only governments but also societies due to the social cost that this phenomenon bears. If one person evades paying taxes, then the other one should pay double taxes because the state’s budget should always be balanced. The purpose of the neural network model tested was to predict the propensity of any individual to show intention for evading taxes based on a multifactor sensitivity analysis. We also compared predictive performance of MLP neural network with that of a binary logistic regression model. The tested MLP neural network model showed a performance rate of 70% higher than that of 60% binary logistic regression model, proving to be a reliable model in predicting tax compliance behavior. This is a useful aspect for public authorities regarding prevention and combating tax evasion. Results obtained by testing MLP neural network implied that transparency in public spending and clearer tax law followed by legal predictability and socio-economic development sustained by public spending are independent variables that non-compliance behavior is the most sensitive about. These results show that tax authorities could pay greater attention in controlling these factors in order to prevent fraudulent behavior. Relevance of results obtained is consistent with current research trends and shows that one of the most important elements in voluntary tax compliance is internal motivation of taxpayer to pay taxes and it can only be consolidated by trust in public authorities offering qualitative public services, ensuring economic growth and social welfare but also having in place a sound system of constraint that punishes illegal behavior. Data was being collected through an online survey gathering participants aged between 22 and 26 years old. The reason lying behind this is due to the fact that it is the period when a person is being educated and public measures could have a greater impact on combating illegal behavior in tax compliance. Results of the MLP neural
80
C. Ioana-Florina and C. Mare
network model could be improved either by testing it with various numbers of hidden layers or by choosing an unsupervised neural model, which could be the objective for further research. Current research results are important in light of seeing which factors have the most impact on modeling taxpayer’s behavior so that tax authorities could apply efficient measures in combating fraudulent behavior. Acknowledgement This paper is part of the project COST CA19130 FinAI - Fintech and Artificial Intelligence in Finance - Towards a Transparent Financial Industry.
References 1. Allingham, M.G., Sandmo, A.: Income tax evasion: a theoretical analysis. J. Public Econ. 1, 323–338 (1972) 2. Kirchler, E., Hoelzl, E., Wahl, I.: Enforced versus voluntary tax compliance: the “slippery slope” framework. J. Econ. Psychol. 29, 210–225, (2008). Available online at www.sciencedi rect.com 3. Hofmann, E., et al.: Enhancing tax compliance through coercive and legitimate power of tax authorities by concurrently diminishing or facilitating trust in tax authorities. Law & Policy 36(3), (2014). ISSN: 0265-8240 4. Iancu (Nechita), E.-A., Popovici-Coita, I.F.: Rethinking economics-of-crime model of tax compliance from behavioral perspective applied to Romanian case. Ann. Univ. Oradea Econ. Sci. Ser. 25(2), 372–381, (2019) 5. Kirchler, E.: Reactance to taxation: employers’ attitudes toward taxes. J Socio-Econ. (1999) 6. Picciotto, S.: Constructing Compliance: Game-Playing, Tax Law and the Regulatory State? Centre for Tax System Integrity (Working Paper No. 81). Australian National University Canberra. (2005). 7. Dell’Anno, R.: Tax evasion, tax morale and policy maker’s effectiveness. J. Socio-Econ. 38(6), 988–997 (2009). ISSN: 1053-5357 8. Kirchler, E., Maciejovsky, B., Schneider, F.: Everyday representations of tax avoidance, tax evasion, and tax flight: do legal differences matter? J. Econ. Psychol. 24, 535–553 (2003) 9. Stankeviciusa, E., Leonas, L.: Hybrid approach model for prevention of tax evasion and fraud. In: 20th International Scientific Conference Economics and Management-2015 (ICEM-2015). Procedia Social and Behavioral Sciences, vol. 213, pp. 383–389 (2015) 10. Liu, L.-X., Zhuang, Y.-Q., Liu, X.-Y.: Tax forecasting theory and model based on SVM optimized by PSO. Expert Syst. Appl. 38(1), 116–120 (2011) 11. Lismont, J., et al.: Predicting tax avoidance by means of social network analytics. Decis. Support Syst. 108, 13–24 (2018). ISSN: 0167-9236 12. Wu, R.-S., et al.: Using data mining technique to enhance tax evasion detection performance. Expert Syst. Appl. 39(10), 8769–8777 (2012) 13. Sharma, A.: A review of financial accounting fraud detection based on data mining techniques. Int. J. Comput. Appl. (0975–8887), 39(1), (2012) 14. Zakaryazad, A., Duman, E.: A profit-driven artificial neural network (ANN) with applications to fraud detection and direct marketing. Neurocomputing 175, 121–131 (2016). Part A 15. Yahyaoui, F., Tkiouat, M.: Partially observable Markov methods in an agent-based simulation: a tax evasion case study. Procedia Comput. Sci. 127, 256–263 (2018) 16. Goumagias, N.D., Hristu-Varsakelis, D., Assael, Y.M.: Using deep Q-learning to understand the tax evasion behavior of risk-averse firms. Expert Syst. Appl. 101, 258–270 (2018) 17. Ryman-Tubb, N.F., Krause, P., Garn, W.: How artificial intelligence and machine learning research impacts payment card fraud detection: a survey and industry benchmark. Eng. Appl. Artif. Intell. 76, 130–157 (2018)
The Utility of Neural Model in Predicting Tax Avoidance Behavior
81
18. Hosaka, T.: Bankruptcy prediction using imaged financial ratios and convolutional neural networks. Expert Syst. Appl. 117, 287–299 (2019) 19. Serrano, W.: Fintech model: the random neural network with genetic algorithm. Procedia Comput. Sci. 126, 537–546 (2018). ISSN: 1877-0509 20. Tkáˇc, M., Verner, R.: Artificial neural networks in business: two decades of research. Appl. Soft Comput. 38, 788–804 (2016). ISSN: 1568-4946 21. Rahimikia, E., et al.: Detecting corporate tax evasion using a hybrid intelligent system: a case study of Iran. Int. J. Account. Inf. Syst. 25, 1–17 (2017) 22. Bishop, M.C.: Neural networks: a pattern recognition perspective. NCRG 96/001 (1996) 23. Bishop, M.C.: Pattern Recognition and Machine Learning, Library of Congress Control Number: 2006922522. Springer Science+Business Media LLC (2006). 24. Fine, T.L.: Feedforward Neural Network Methodology. Springer-Verlag 1999, Library of Congress ISBN 0-387-98745-2, (1999) 25. Pérez López, C., Jesús Delgado Rodríguez, M., de Lucas Santos, M.: Tax fraud detection through neural networks: an application using a sample of personal income taxpayers. Future Internet 11(4), 86 (2019). Special Issue Future Intelligent Systems and Networks 2019 26. Alameer, Z., et al.: Forecasting gold price fluctuations using improved multilayer perceptron neural network and whale optimization algorithm. Resour. Policy, 61, 250–260 (2019). ISSN: 0301-4207 27. Amakdouf, H., et al.: Classification and recognition of 3D image of Charlier moments using a multilayer perceptron architecture. In: The First International Conference on Intelligent Computing in Data Sciences. Procedia Computer Science, vol. 127, pp. 226–235 (2018)
Triple-Station System of Detecting Small Airborne Objects in Dense Urban Environment Mikhail Sergeev , Anton Sentsov , Vadim Nenashev , and Evgeniy Grigoriev
Abstract The article discusses the expediency of using distributed radars for higher accuracy of small airborne object trajectory measurement in urban environment. For AO detection, we propose a way of using three radar stations built according to the technology of cascaded active-phased waveguide-slot antenna array: two sectorsurveillance radar stations and a circular surveillance one. The article discusses simulation model processes the angular and range data obtained from real experiments. The structural scheme of simulation stages is proposed. Simulation model is designed for choosing the parameters of the systems to be developed and for polishing the algorithms of matching the data from three standalone radar stations with an overlap zone, creating a shared information space. The feature of the proposed system is that it allows you to determine the trajectory coordinates of a UAV-like object in a dense urban environment and to specify and simulate the characteristics of such systems at the development stage. Keywords Coordinate measurement · Object detection · Spatially distributed system · UAV · Triple-station system · Radar station · Integrated information processing · Simulation model
1 Introduction The purpose of the work is to create a model which would simulate the radar environment of a three-station system with ground radar stations to choose parameters for the development of such systems. The model is necessary for testing the algorithms of matching the data from three standalone radars with an overlap zone, providing a shared information space for the detection and location of UAV based on angular and range data obtained from real experiments. The specificity of the problem posed here is that we have to detect UAV with high maneuverability and small effective radar cross-section (RCS). This calls for M. Sergeev · A. Sentsov · V. Nenashev (B) · E. Grigoriev Saint-Petersburg State University of Aerospace Instrumentation, 67 Bolshaya Morskaya Street, Saint-Petersburg 190000, Russian Federation © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_7
83
84
M. Sergeev et al.
the development of new radar data integrated processing algorithms and models for testing these algorithms in adverse weather conditions, including the conditions of dense urban environment. A literature review shows that radar systems are used for a wide range of problems associated with detecting, locating, and auto tracking of objects (AO) by radar stations, and subsequent classification of objects, various in types and characteristics, such as ground, surface, or airborne [1–6]. The development of data processing technology imposes new requirements to radars in terms of both detection speed and trajectory measurement accuracy. A system of two or more spatially distributed radars has significant advantages as compared to a single-station system, because it can determine the full speed vector of the objects observed, as well as their angular and range coordinates. The accuracy of angular coordinate measurement is much higher in a distributed radar system [7–18]. Multi-station systems are called spatially distributed. Apart from radars, they can include other data sources (optical, infrared, acoustic) [8]. When designing such systems, you need to search for new methods and models of real-time integrated information processing, involving high-speed computers capable of dealing with large volumes of data. The scientific problem to be solved is the development of a model for integrated processing of data from three radars in their matching point and for the control over their operation to build up an integrated trajectory of the object movement. The purpose of the model under development is to verify the possibility of effectively providing the radar environment in a spatially distributed system of standalone radar stations for tracking airborne objects, including UAV. Section 2 proposes a structural scheme and features of modeling of distributed location systems. We continue in Sect. 3, where echo signal model is considered. Section 4 discusses the methods of experimental data registration and features of radar station layout scheme. Section 5 presents the obtained simulation results.
2 Structural Scheme and Features of the Simulation Algorithm Domain analysis [4–8] shows that the integration of radar data from spatially distributed sources into a shared information space is a specific problem that need to be studied. The model of radar environment and its evaluation for a triple-station system of standalone radars operating in ground/air mode includes plenty of objects, namely: • reflections from the underlying surface; • reflections from a natural or artificial AO; • reflections from static objects in the surveillance area, etc.
Triple-Station System of Detecting Small Airborne Objects …
85
Besides, each object in the model has its own features. For example, each AO has its speed and acceleration vectors, RCS, etc. The simulation model of a triple-station system of standalone radars for our research consists of several software blocks performing the following tasks: • feeding the input data (antenna pattern width for the radar stations 1, 2 and 3, air surveillance zone coordinates, angular and range resolution required, initial AO position in the system of global coordinates, heading angle of the AO, pulse repetition period of the sounding signal modulated [19–21] by a special code, RCS of the AO, surveillance zone width, ground speed and acceleration of the AO, etc.); • modeling of relative positions of the distributed radar system and the AO observed; • generation of the echo signal modulated by a special code for radar stations 1, 2, and 3 (adding noises); • detecting the AO on noisy background at each radar station; • estimating the trajectory coordinates at each radar station; • exchanging data between the three stations; • integration of data in the distributed system; • joint tracking and building the trajectories of AO movement. The structural scheme of the model is shown in Fig. 1. In this scheme, we use methods and algorithms of signal processing developed for modern spatially distributed systems of aerial monitoring in ground/air mode. In the blocks of this scheme, we take into account the following: sophisticated routes of AO
Fig. 1 Structural scheme of joint AO locating and tracking in simulation
86
M. Sergeev et al.
movement; position and shape of the antenna patterns responding to AO features; influence of the weather conditions, etc. The simulation is conducted for coherent and non-coherent sequences of pulses when the signal is fluctuating. Each software block is simulated separately, taking into account the functional features and statistical data obtained from the practice. With this approach, the algorithms can be properly debugged, making sure that the obtained results are reliable, and the integration algorithms are of high quality.
3 Representing the Radar Data in the Echo Signal Model Let us consider in detail how an echo signal is generated for simulation; in other words, how the reflections from AO are formed in the overlap zone of the three radar stations on the base of the surveillance algorithms. The reflection models, together with the functioning of blocks 2 and 3 (see Fig. 1), determine the cubic structure of the radar information formed by echo signals from the three stations. After that, three signals are generated for their functioning modes. In the data transfer channels, during the reception of the information, the signal of interest is mixed with interference that should be taken into account. Its sources are the internal noises of the devices; also there can be external noise of natural or artificial origin, including the weather conditions (block 3.3). The trajectory signal is simulated for a cluster of sounding signals. The received signals are stored in the format of a data cube, such as the one presented in Fig. 2. The processing along the three cube axes correspond to the estimation of range, doppler frequency, and angular position of the AO. The echo signal generation block forms data cubes which are the reflections of the echo signals for the standalone distributed radar stations 1, 2, and 3. They form the digital signal data and feed them to the central processing block. Fig. 2 Radar data cube
n
ato
ye
nc
fre
Range estimation
ler
e qu
m sti
p op
D
Angle estimation
Triple-Station System of Detecting Small Airborne Objects …
87
The research problem is to combine the received locating data of various types as data cubes in a shared information space. The problem assumes that we have to develop models, methods, and processing algorithms for large volumes of information. As the generation of a signal can take from several minutes up to tens of minutes, the signal is simulated on a separate high-end workstation based on NVIDIA Tesla K20X, for mathematical calculation in computer simulation package MATLAB 2020a. The data exchange with the block of integrated processing and radar station control is implemented via a high-speed interface. The signal is fed into the algorithms of the above-mentioned processing and integration blocks by means of a special software module. The operation modes of the radar stations are simulated by a set of programs. One of them (see Fig. 1, blocks 1 and 2) is responsible for the input of the initial data and the layout of the stations and AO observed. The second one (see Fig. 1, block 3) directly generates the echo signal from the AO. Having the spherical coordinates of the radar stations and AO, at each iteration, we can record the echo signal characteristics as a cubic structure, as shown above. This helps us to avoid the procedure errors possible when modeling an echo signal envelope in Cartesian coordinate system. Besides, the cubic representation of the echo signal makes it a convenient mathematical object for computerized machine processing and for the transformation (compression and masking) of large volumes of data in high-speed channels of distributed systems.
4 Specialized Radar Station Layout and Experimental Data Registration The simulation model for displaying the air environment allows you to load registered experimental data formed by means of radar stations build by cascaded active-phased waveguide-slot antenna array technology. The range of mobile radar stations build by this technology includes 2D radars (Mars-1 M, Mars-B) and a 3D all-round radar Mars-VTI. The mobile all-round radar is designed for the following tasks: • to equip monitoring complexes of external tracking of airborne, ground or surface objects, including those with small RCS (about 0.02 m2 ); • to perform radar detection of small UAV, including low-altitude and stationary ones. The application of the all-round radar Mars-VTI together with sector-surveillance radars improves the accuracy of determining the angular coordinates of an object due to joint processing of the radar data. When using several radar stations, each of them provides data about the object position in the system of polar coordinates; after that, all the information is united at
88
M. Sergeev et al.
the operator’s workplace. This workplace can be situated either at the position of one of the radars, or separately from them. For data exchange, wireless communication lines are used [2]. After the registered radar data are processed, we have the information about the range and angular position of the object in regard to each radar in real time [3]. At each radar, the data are tied to the unified time system and become synchronized, providing a single trajectory of the object. Radar stations support the operation modes which provide detecting any stationary or moving objects with speed filtering. The operation mode and parameters are specified by software at the operator’s workplace, which also performs secondary processing of the data obtained. The problems of information processing and displaying are integrated into the computing facilities of the control unit, and the data are displayed on a monitor. Figure 3 shows a specialized layout of radar stations in a system with an overlap zone [12, 13]. It should be noted that radar stations build by cascaded active-phased waveguideslot antenna array technology are designed for detecting airborne objects, including small UAV [4] in dense urban environment. Based on the results of the processing of joint information registered using a system of three such stations with an overlap zone, the data on three coordinates of the AO can be calculated: the range in regards to the reference point (radar station 3 in this layout), the altitude (calculated by the deviation in the elevation plane), and the lateral deviation QZ (in the azimuth plane). In the course of work, UAV databases were developed [22–24]. We analyzed those UAV which could carry working load with the required characteristics such as RCS, speed, and time of the flight. Based on the results of this analysis, a hexacopter-like
Moving object Joint surveillance area
R=28 km for object Radar 1
Radar 3
with RCS ≈1 m 2
Radar 2
Fig. 3 Layout of three radar stations with an overlap zone in urban environment
Triple-Station System of Detecting Small Airborne Objects …
89
carrier was designed and fabricated on the base of DJI S900 frame. Its working load was a corner reflector. Typical hexacopter-based air carrier object (a), and aircrafttype UAV (b) are shown on Fig. 4. As a detection object with RCS about 1 m2 , an air target imitator can be used, consisting of a carrier (UAV) and a radiocontrast marker as its working load. To achieve the required RCS, a corner radar reflector was constructed, aimed at the wavelength used in the radars [13–18]. Structurally, it consists of four trigonal reflectors, each connected to the others by two common planes. See it in Fig. 5. This construction ensures the same average value of RCS at various view angles within a hemisphere when the airborne object is detected from the ground. One of the trajectories implemented in our experiments was one peculiar for UAV, which gets into the overlap zone of the three radars: it started at the range of 1500 m from the radar station 1, moved to 3500 m away from it, climbing up to 750 m (with
Fig. 4 Typical hexacopter-based air carrier object (a), and aircraft-type UAV (b)
Fig. 5 Corner radar reflector
90
M. Sergeev et al.
some lateral displacement caused by the wind), turned back, returned to the starting point and landed in manual mode. When the layout is serial, the radar 1 is responsible for the surveillance in a narrow sector of the AO trajectory, the radar 2 provides lateral surveillance of the AO trajectory, and the radar 3 adds circular surveillance of the space available from radars 1 and 2, including the overlap zone. The conducted experiments have given us the necessary statistical data for the implementation of a distributed system model.
5 MATLAB Simulation Results To simulate a triple-station spatially distributed system, a software package was developed in MATLAB computer simulation system. The package consists of three modules: module of parameter setup for the radar stations and objects observed; module of AO flight trajectory generation; module of determining the mutual disposition and measuring the trajectory coordinates. The mutual disposition of the AO and the radar stations is visualized on a 2D chart showing you the joint surveillance zone (see Fig. 6). The model allows you to obtain the object coordinates either in the universal Cartesian coordinate system or in the polar local coordinate system of each radar station [12, 18]. A general example is given in Fig. 7 where you can see the moving AO and three radar stations tracing it. Fig. 6 Joint surveillance zone of a triple-station radar
Triple-Station System of Detecting Small Airborne Objects …
91
Fig. 7 3D chart of moving airborne object surveillance
The software model ensures the setting of trajectories and making a decision about further AO tracing by the distributed radar. The modules can operate either independently or as parts of a complex, implementing the entire system. The parameter setup module can deal with each radar station separately, bringing them closer to really existing samples. Simulation of various scenarios allows you to speed up the deployment of the system, shortening the stage of seminatural and natural tests. The model is close to real operating conditions because it has fully formed databases with the characteristics of unmanned airborne systems of airplane, helicopter, or multicopter type [22–24]. Another important feature of the model is taking into account the weather which can affect the propagation of radio waves of the used lengths in natural atmospheric conditions. The model contains a procedure for precipitation-caused failure prediction and can estimate long-term statistics of radio wave attenuation in rain.
6 Conclusion Tactical characteristics of radar systems can be improved without their hardware modernization, due to a more sophisticated layout and algorithms for radar information processing. The developed model allows you to simulate the movement of airborne objects (including UAV-like objects with small RCS) both in open space and in dense urban areas which can help in monitoring socially significant objects of urban infrastructure for example. Model can determine their position in space (angular coordinates) and perform joint data processing in a system of three radar stations, with possible measurement errors.
92
M. Sergeev et al.
The model can be applied to the search for the best technical characteristics of radar systems and their software at the stage of their development. It allows you to combine the data about airborne object detection from three standalone radar stations with a joint surveillance zone into a united information space in real time. Acknowledgements The reported study was funded by RFBR, project number 19-29-06029.
References 1. Mahafza, B.R.: Radar Systems Analysis and Design Using MATLAB, vol. 3, p. 743. Chapman and Hall/CRC (2016). Available from: https://doi.org/10.1201/b14904 2. Shishanov, S.V., Myakinkov, A.V.: The system of the circular review for vehicles based on ultra-wideband sensors. J. Russ. Univ. Radioelectron. 2, 55–61 (2015). (In Russ.) 3. Gimignani, M., Paparo, M., Rossi, D., Scaccianoce, S.: RF design and technology supporting active safety in automotive applications. In: 2013 IEEE 10th International Conference on ASIC, pp. 1–4. IEEE; (2013). Available from: https://doi.org/10.1109/asicon.2013.6811875 4. Verba, V.S., Merkulov, V.I. (eds.). Estimation of Range and Speed in Radar Systems, p. 3. M. Radiotechnik (2010). (In Russ.) 5. Ji, Z., Prokhorov, D.: Radar-vision fusion for object classification. In: 2008 11th International Conference on Information Fusion, pp. 1–7 (2008) 6. Melvin, W.L., Scheer, J.A.: Principles of Modern Radar vol. II: Advanced Techniques, vol. 2, pp. 2300. Scitech Publishing (2013) 7. Zaitsev, D.V.: Multi-position radar systems. Methods and algorithms for processing information under interference conditions. Radio Engineering, Moscow (2007). (In Russ.) 8. Raol J.R. Multi-Sensor Data Fusion with MATLAB, p. 534. CRC, (2009). Available from: https://doi.org/10.1201/9781439800058 9. Nenashev, V.A., Sentsov, A.A., Shepeta, A.P., Formation of radar image the earth’s surface in the front zone review two-position systems airborne radar. In: 2019 Wave Electronics and Its Application in Information and Telecommunication Systems (WECONF), pp. 1-5. SaintPetersburg, Russia (2019). https://doi.org/10.1109/weconf.2019.8840641 10. Willow, V.S., Tatarsky, B.G. (eds.): Radar systems for aerospace monitoring of the earth’s surface and airspace, p. 576. Monograph M., Radiotekhnika (2014). (In Russ.) 11. Nenashev, V.A., Shepeta, A.P.: Accuracy characteristics of object location in a two-position system of small onboard radars. Inf. Control Syst. 2, 31–36 (2020). Available from: http:// www.i-us.ru/index.php/ius/article/view/4981 12. Nenashev, V.A., Kryachko, A.F., Shepeta, A.P., Burylev, D.A.: Features of Information Processing in the Onboard Two-Position Small-Sized Radar based on UAVs, pp. 111970X1–111970X-7. SPIE Future Sensing Technologies, Tokyo, Japan (2019) 13. Nenashev, V.A., Sentsov, A.A., Shepeta, A.P.: The problem of determination of coordinates of unmanned aerial vehicles using a two-position system ground radar. In: 2018 Wave Electronics and its Application in Information and Telecommunication Systems (WECONF), p. 5. IEEE, (2018). Available from: https://doi.org/10.1109/weconf.2018.8604329 14. Wang, R., Deng, Y.: Bistatic InSAR. Bistatic SAR System and Signal Processing Technology, pp. 235–275. Springer, Singapore (2017). Available from: https://doi.org/10.1007/978-981-103078-9_8 15. Shepeta A.P., Nenashev V.A.: Modeling algorithm for SAR. In: Proceedings of SPIE Remote Sensing, vol. 9642, pp. 96420X-1–9642OX-8. Toulouse, France (2015). https://doi.org/10. 1117/12.2194569 16. Toro, G.F., Tsourdos, A.: UAV Sensors for Environmental Monitoring, p. 661. MDPI, Belgrade (2018). Available from: https://doi.org/10.3390/books978-3-03842-754-4
Triple-Station System of Detecting Small Airborne Objects …
93
17. Klemm, R. (ed.): Novel Radar Techniques and Applications. Vol 1: Real Aperture Array Radar, Imaging Radar, and Passive and Multistatic Radar, p. 1. Scitech Publishing, London (2017). Available from: https://doi.org/10.1049/sbra512f_pti 18. Klemm, R. (ed.): Novel Radar Techniques and Applications. Waveform Diversity and Cognitive Radar, and Target Tracking and Data Fusion, p. 2. Scitech Publishing, London (2017) 19. Sergeev, M.B., Nenashev, V.A., Sergeev, A.M.: Nested code sequences Barker—Mersenne— Raghavarao. Informatsionnoupravliaiushchie sistemy [Information and Control Systems], no. 3, pp. 63–73 (2019). (In Russian). https://doi.org/10.31799/1684-8853-2019-3-63-73 20. Nenashev, V.A., Sergeev, A.M., Kapranova, E.A.: Research and analysis of autocorrelation functions of code sequences formed on the basis of monocyclic quasi-orthogonal matrices. Informatsionno-upravliaiushchie sistemy [Information and Control Systems], (4), 9–14 (2018). https://doi.org/10.31799/1684-8853-2018-4-9-14 21. Sergeev, A.M., Nenashev, V.A., Vostrikov, A.A., Shepeta, A.P., Kurtyanik, D.V.: Discovering and analyzing binary codes based on monocyclic quasi-orthogonal matrices. In: Czarnowski, I., Howlett, R., Jain, L. (eds.) Intelligent Decision Technologies 2019. Smart Innovation, Systems and Technologies, vol 143, pp 113–123. Springer, Singapore (2019). https://doi.org/10.1007/ 978-981-13-8303-8_10 22. Sergeev, M.B., Nenashev, V.A., et al.: Baza dannyh harakteristik bespilotnyh letatel’nyh sistem vertoletnogo tipa [Database of characteristics of unmanned aerial systems of helicopter type]. Sertificate of state registration no. 2020621680 (2020) 23. Nenashev, V.A., et al.: Baza dannyh harakteristik bespilotnyh letatel’nyh sistem samoletnogo tipa [Database of characteristics of unmanned aerial systems of aircraft type]. Sertificate of state registration no. 2020621745 (2020). 25 Sep 2020 24. Sergeev, M.B., Nenashev, V.A., et al.: Baza dannyh harakteristik bespilotnyh letatel’nyh sistem mul’tikopternogo tipa [Database of characteristics of unmanned aerial systems of multicopter type]. Sertificate of state registration no. 2020621745, (2020)
Using Families of Extremal Quasi-Orthogonal Matrices in Communication Systems Anton Vostrikov , Alexander Sergeev , and Yury Balonin
Abstract The evolution of knowledge about quasi-orthogonal matrices as a generalization of Hadamard matrices is briefly considered in the paper. We describe the origin of the names of quasi-orthogonal matrix families and the connection of their orders with known numerical sequences. The structured matrices, searching for which is of the greatest practical interest, are distinguished: symmetric, persymmetric, cyclic, structured according to Walsh. There are also features and characteristics of such matrices highlighted in the paper. Examples of the applicability of the matrices for various tasks of information transformation and processing are given: noise immune coding, masking, compression, signal coding. Keywords Orthogonal matrices · Quasi-orthogonal matrices · Hadamard matrices · Image masking · Interference coding · Signal coding
1 Introduction Calculation using orthogonal matrices can be applied to various problems found in communication systems [1–6], such as: • filtering of signals received from channels; • image compression and contour extraction; • protecting digital images transferred over open channels from unauthorized access; • impulse noise immune image coding; • coding and marking of signals in digital communication channels, radars, etc. For all these transformations, the matrices from the representative Hadamard family are mainly used, though you can also come across Belevitch matrices, Haar matrices, Lee’s jacket matrices, discrete cosine transform (DCT) matrices, etc. The number of matrices in the Hadamard family is known to be limited: they exist only A. Vostrikov (B) · A. Sergeev · Y. Balonin Saint-Petersburg State University of Aerospace Instrumentation, 67 Bolshaya Morskaya Street, Saint-Petersburg 190000, Russian Federation © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_8
95
96
A. Vostrikov et al.
in evenly-even orders. The modern enhancement of orthogonal data transformations suggests the search for new matrices in non-Hadamard orders, including odd ones, which has led to the appearance of a new matrix class: quasi-orthogonal matrices. First quasi-orthogonal (column-orthogonal) matrices were described in 2006 in [7], and dubbed M-matrices. They were square matrices M of order n with modulobounded elements |mij | ≤ 1 such that M T M = ω(n)I
(1)
where I is an identity matrix, and ω(n) is a weight function. Apparently, the representation (1) describers orthogonal matrices for which AT A = I is true, including Hadamard matrices H T H = nI, being their generalization. Thus, Hadamard matrices can be considered quasi-orthogonal. The originally found M-matrices [7] had elements mij with a lot of different values not distinguished in any way by the authors. However, for the processing and transformation problems listed above, the most preferable matrices are those with a limited number of element values: two (such as Hadamard matrices) or three (such as Belevitch matrices). Therefore, the result of the further development of the quasiorthogonal matrix theory were identified families of M-matrices, namely Mersenne, Fermat, and Euler matrix families [8–10] with a limited number of elements and structural peculiarities. The purpose of this work is to specify the structural peculiarities of quasiorthogonal matrices, describe their characteristics, and systematize the fields of their usage for data transformation in communication systems.
2 Why Mersenne, Fermat and Euler Matrices? Quasi-orthogonal Hadamard matrices with two values of their elements, 1 and −1, are known to exist only in orders 4t, where t is a natural number [4]. These matrices are also maximum determinant matrices for these orders. To cover the whole sequence of natural numbers by quasi-orthogonal matrix orders, we can specify three more sequences whose numbers follow each other with a gap of four, just like the sequence of Hadamard matrix orders. These are sequences to the left of n = 4t − 1 on the number axis, to the right of n = 4t + 1, and n = 4t − 2. The attempts to calculate quasi-orthogonal matrices with two-element values in the orders of these sequences assume lower requirements to negative values. In other words, instead of −1 we should use elements −b, where | b | < 1. As more and more M-matrices with two-element values 1 and −b were found, the connections were discovered between M-matrices with the same characteristics and dependencies b = f (n) [11]. One of the goals of the quasi-orthogonal matrix theory we develop is to search for these matrices in all orders corresponding to the sequence of natural numbers.
Using Families of Extremal Quasi-Orthogonal Matrices …
97
Theoretically, it is a very interesting problem. Practically, its successive solution, with higher and higher orders of the found matrices, can significantly expand the field of choosing the best options for orthogonal transformation of data and signals. The families of such matrices [8–10] were named after the sequences of natural numbers which included the orders of the originally calculated matrices. The affiliation of these orders with the well-known Mersenne and Fermat numerical sequences, as well as the number of the elements and the way of their calculation contributed to the full-scale formation of these matrix families and their classification, first presented in [11]. The Mersenne sequence defined as n = 2t − 1 and starting from numbers 1, 3, 5, 15, 31, … belongs to the subset of numbers of the form 4t − 1. In Fig. 1 you can see portraits of Mersenne matrices of order 15 and a similar quasi-orthogonal matrix of order 35 which does not belong to Mersenne numbers by its order value but appears with this order in the above-mentioned subset. A black field in the image corresponds to a matrix element with the value −1, and a white one corresponds to a matrix element with the value 1. The Fermat sequence defined by the formula n = 22k + 1 and starting from numbers 3, 5, 17, 257, 65,537, … belongs to a subset of numbers of the form 4t + 1. Thus, the matrices which satisfied the condition (1) in orders belonging to the above-mentioned subsets of odd numbers were naturally named Mersenne and Fermat matrices, respectively. It should be noted that the Fermat sequence, unlike the Mersenne sequence, is nested into a numerical sequence similar to it. For example, 65 is not a Fermat number, but together with Fermat numbers it belongs to the sequence n = 22t + 1 nested into 4t + 1. This double nesting leads to an enhancement of such matrices.
Fig. 1 Portraits of quasi-orthogonal matrices of orders 15 and 35
98
A. Vostrikov et al.
Fig. 2 Portraits of quasi-orthogonal matrices of orders 17 and 65
However, we can consider that the second sequence of the common orders is the main one, and the Fermat sequence is a part of it. Here we have some freedom of definition: following Hadamard who generalized Silvester matrices supplementing them with his own ones, we can expand the concept of both Mersenne and Fermat matrices. However, the double expansion in the second case indicates that these problems are not symmetric, being generalized in different ways. To calculate Fermat matrices which have elements of three possible values, there is an iterative procedure for building a sequence of quasi-orthogonal matrices whose orders are equal to Fermat numbers and to the numbers producing natural expansions, of the form n = 2k + 1, where k is even (except k = 1): 3, 5, 17, 65, 257, 1025, … Figure 2 gives you an example: portraits of Fermat matrices of orders 17 and 65, where a grey field corresponds to an element with the third value. There is an exception: Euler matrices of orders n = 4t + 2. Their name is based on the fact that they can successfully replace often unavailable Belevitch matrices whose constructability is determined by whether or not the number n − 1 can be resolved into a sum of squares of two numbers. The theory of this sophisticated problem traces back to Euler’s works.
3 The Main Features of Quasi-Orthogonal Matrices The number of quasi-orthogonal matrices considerably increases the number of Hadamard matrices if for no other reason than because they overlap odd and extra even orders. There was a time when even the expansion of Belevitch who offered to use zeros for the opportunity to build orthogonal arrays was taken as a doubtful
Using Families of Extremal Quasi-Orthogonal Matrices …
99
novelty. Nowadays, it is considered a step towards the right direction: we should add elements with different values, for example, irrational ones, to expand the class of orthogonal matrices. For the first time, this statement of the problem unexpectedly came out for matrices with complex elements where the limitation was, for example, the demand to deal with powers of a complex number. But complex numbers are much better than integers at adapting the matrix to orthogonality. The irrationality of one element value is a noticeably stricter limitation: the structures produced with it have the same complexity as Hadamard matrices [11]. Quasi-orthogonal matrices are different from complex generalizations in one more way: they are locally extremal by determinant as a product of the same optimization procedure which produces Hadamard. Belevitch, Mersenne, Fermat and Euler matrices. For example, a Fermat matrix of order 257 was first found without any preliminary knowledge of its structure. This is a new feature, never noticed before. As it happens, all the matrices found by Belevitch during his combinatorial studies can be automatically found by the same simple algorithm which allows you to find Hadamard matrices. This is evidence that they are fundamentally related, and it is possible to preventively predict the absence of some other stable structures. It follows that quasi-orthogonal matrices can fully cover all possible orders. If at least one type of these matrices exists in all orders attributed to them, then in general they all exist [12]. In sum, it can be said that for all known sequences, there exist studies [11, 12] or not yet studied relations between matrices in orders that correspond to them. The number of different element values in matrices is limited, it is usually two or three [8–11]. Also, the matrices in families have limited complexity. These are all possible single-block matrices, including monocyclic ones [13–15], two-block matrices, including two-circulant, edged two-circulant, double-edged, three-block Balonin– Seberry matrices, four-block Williamson matrices, including such their properties as symmetry, persymmetry, antisymmetry, etc. It is important to note that the discussed quasi-orthogonal matrices from various families form chains connected by simple operations such as edging, etc. Therefore, going down the chain, we come to the predictor matrix [16] which has produced all this matrix variety. Hard-to-find quasi-orthogonal matrices [17, 18] are matrices of local or global maximum determinants found by an optimization procedure, not only by combinatorial analysis. Definitely, this is a display of the dual character of matrix order numbers which break up into integer and irrational ones, staying interrelated. The development of the theory and practice of search for the matrices in question is enriched by new types of blocked constructions similar to the Williamson construction [19], including single or double edges [20] We should mention the existence of separate quasi-orthogonal matrices outside the sequences we have discussed. They are matrices of orders 5, 10, 20, 40, 80, etc. whose orders are divisible by image sizes in traditional formats [21].
100
A. Vostrikov et al.
Quasi-orthogonal matrices, with all their advantages we have mentioned, can be effectively used in transformation and coding of digital data. Here are the features that contribute to it: • limited number of matrix elements, which makes it simple to use these matrices for computing on DSP processors; • symmetries in the matrices and typical constructions, which facilitate their simple and economical storage in memory; • full coverage of all the numerical sequence by matrix orders, which provides a wide range of choice for the developers of orthogonal transformation algorithms; • existence of specially structured types of matrices.
4 Applications of Quasi-Orthogonal Matrices 4.1 Matrices Structured According to Walsh in Image Compression Traditionally, transformations with orthogonal matrices are used in image and video compression algorithms. In JPEG or MJPEG algorithms, as well as in high-definition video formats such as DV, DVPRO, HDCAM, DVCPRO, etc., such an original matrix is a DCT matrix [3]. However, recent research showed that a DCT matrix can possibly be replaced by other structured quasi-orthogonal matrices. Fig. 3 Portrait of a Mersenne–Walsh matrix of order 31
Using Families of Extremal Quasi-Orthogonal Matrices …
101
Fig. 4 Portraits of four-value Mersenne–Walsh matrices of orders 7, 15, and 63
Figure 3 gives you an example which is a symmetric quasi-orthogonal Mersenne matrix of order 31 structured according to Walsh [22]. Similar matrices of orders 7, 15, and 63 are also known. Of course, the compression algorithm modification is not just putting such a matrix instead of a DCT one. To use quasi-orthogonal matrices, we need an algorithm adaptation that would, among other things, take into account the structural features of these matrices. Apart from the symmetric Mersenne–Walsh matrices with two possible values of their elements, Mersenne–Walsh matrices with four possible values of their elements [22] proved themselves to be better. Figure 4 shows you colored portraits of such matrices, orders 7, 15, and 63. Apparently, unlike DCT, when they are used in image transformation, the low frequencies are in the bottom right corner, and the high frequencies are in the top left one. Due to this, the compression algorithm will have a different order of scanning the matrix after it is quantized. To make sure that the image compression experiments with DCT and Mersenne– Walsh matrices are equivalent, for the JPEG algorithm we set up identical compression rates determined by the quantization threshold. In both cases, the filtering followed the rule “reset to zero everything below threshold, and leave unchanged what is above it”. In Fig. 5, you can see the test image “Racoon” restored after a compression at different rates, with DCT of order 8 (left column) and with the Mersenne–Walsh matrix of order 7 from Fig. 4 (right column). Visually, these results, like some other ones with various test images, demonstrate that even a threefold compression with the use of a Mersenne–Walsh matrix provides a restored image closer to the original one.
102
A. Vostrikov et al.
Fig. 5 Images restored after compression with a four-value Mersenne–Walsh matrix, compression ratio 2 (a), and 3 (b)
4.2 Symmetric Quasi-Orthogonal Matrices in Noise-Immune Coding and Image Masking In [23], the authors discuss an image masking method for transfer via telecommunication channels. It can be represented as follows: Y = GT X G
(2)
where Y is the transformed image, X is the original image, G is the masking matrix. The purpose of the transformation is to destroy the original image, preventing an unauthorized intruder from viewing it.
Using Families of Extremal Quasi-Orthogonal Matrices …
103
Fig. 6 Original and masked images in a channel, with the same representation format
This method utilizes the advantages of strip transformation [24] for image coding with pulse noise protection. The masking, in addition to the protection from unauthorized viewing, provides noise immunity. Kronecker product, so typical for strip transformation, is replaced by simple matrix pre- and post-multiplication, providing a plenty of masking matrices to use: extremal quasi-orthogonal ones, whose orders are equal to the image size. For noise immunity, the matrices have to be minimax [24]. This follows from the fact that the value of the maximum element of a matrix determines the noise amplitude. Thus, the problem of finding optimal matrices for masking can be reduced to the problem of choosing orthogonal matrices with the smallest maximum element. Extremal quasi-orthogonal matrices perfectly suit this scenario. Search for such matrices is a resource-consuming task even for supercomputers, giving us hope that they will be hackproof. Figure 6 presents the original image and the image obtained by masking with a symmetric blocked Hadamard matrix of order 32 whose portrait is given in Fig. 7. Figure 8 demonstrates the restoration of a masked image, six packages of which, 1024 bytes each, were lost in the channel during transfer. It is a good visualization of “smudging” the noise on the restored image, an important strip transformation feature.
4.3 Cyclic Quasi-Orthogonal Matrices in Signal Coding Analysis of many code sequences used for coding signals in channels has shown that they are code sequences of Mersenne type. For example, Barker code of length
104
A. Vostrikov et al.
Fig. 7 Symmetric Hadamard matrix of order 32 obtained by edging of a Mersenne matrix of order 31
Fig. 8 Image distorted by noise in the channel and then restored
11 used in IEEE802.11 standard corresponds to a part of a row in the matrix of maximum determinant of order 41 calculated by Guido Barba [25]. In a number of works [13, 14], the authors discuss the studies about using rows of quasi-orthogonal matrices as codes. Just like in the compression case, the matrix rows are not used directly but rather adapted; at the same time, their autocorrelation functions (ACF) meet lower requirements: for example, their secondary peaks can have values larger than one. This is acceptable in cases when the central peak of the ACF is considerably larger than one.
Using Families of Extremal Quasi-Orthogonal Matrices …
105
Experimental results show the ways to obtain new codes based on rows of monocyclic quasi-orthogonal Mersenne and Raghavarao matrices (see Fig. 9), with better characteristics than the existing codes have today [14]. Our research is showing the direction for further work in order to obtain codes of maximum length. In particular, we have found first rows of circulants as a base for building codes of lengths 31, 35, 127, and 143, presented in Table 1. The entirety of research on the relation between cyclic matrices and codes allows us to significantly enhance the general coding theory, satisfying the demands to the
Fig. 9 Monocyclic Mersenne matrices of order 7 and Raghavarao matrices of order 13
Table 1 Rows of cyclic matrices calculated by three strategies Strategy 1
Matrix order, n 35 127
2
35 143
3
31 127
First circulant row represented by signs of the elements −++−++++−+−++−−−++++−+++−+++−−+++++ −−+−+++−−+−+−+−−−−−+−−++−−+++−−−+−−−−− ++−−+−++−+−−+− +++++++−−−+−+++−−−−−−− +−++−+−−+−++−−+++++−++−−−++−− ++−+++++ −+−+−++−−−+−+++ ++−++−−+−+−++++−++−−−+−−−−−+++−−−+− +++++−++++−−+++−+−++−+−++++++−−−+−−− +−++−++−−−+− +++−+++−++−−−−−+++−−−+−−++− ++−+−−+++++−−−−−++−−−+−+− +−−++−+−++−− +++−−+−−−−−+−−+−+−+−−+−−−−+−−−− +−−−−−−++−+−+−−−+++−+++++−−+−−+ +−−−−−−+++++++−+−+−+−−++−−+++−+++−+−− +−++−−−++−++++− ++−+−++−++−−+−−+−−− +++−−−−+−+++++−−+−+−++−−++−+−−−+−− ++++−−−+−+−−−−++−−−−−
106
A. Vostrikov et al.
modern telecommunication and radar systems by the obtained unsymmetric codes with the representation 1 and –b.
4.4 Quasi-Orthogonal Matrices in Other Applications Recently, independent researchers have reported the usage of Mersenne–Walsh matrices of order 31 for wavefront aberration correction in controlled deformable mirrors [26]. Hadamard–Walsh matrices are used for representing the structure of chemical formulas [27], and blocked Hadamard matrices can be applied to bioinformatics entity modeling in genetics [28]. It is also interesting to use automatic generation of graphical representation (portraits) of matrices as a base for developing modern ornamental patterns in textile industry.
5 Conclusions The results discussed here are aimed at stimulating the scientific interest in families of new quasi-orthogonal matrices which generalize orthogonal matrices, providing a base for re-examining the algorithms of data transformation and processing. Our point is that the research conducted within the last 10 years will have a long after-effect, because the new original and unique matrices and methods of their usage need to be studied, modified, generalized, and enhanced. Acknowledgements The reported study was funded by the Ministry of Science and Higher Education and of the Russian Federation, grant agreement No. FSRF-2020-0004.
References 1. Ahmed, N., Rao, R.: Orthogonal Transforms for Digital Signal Processing. Springer-Verlag, Berlin Heidelberg, New York (1975) 2. Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Prentice Hall Upper Saddle River, New Jersey (2001) 3. Wang, R.: Introduction to Orthogonal Transforms with Applications in Data Processing and Analysis. Cambridge University Press (2010) 4. Horadam, K.J.: Hadamard matrices and their applications: progress 2007–2010. Crypt. Commun. 2(2), 129–154 (2010) 5. Sergeev, A.M., Blaunstein, N.Sh.: Orthogonal matrices with symmetrical structures for image processing. Informatsionno-upravliaiushchie sistemy [Inf. Control Syst.] 6(91), 2–8 (2017). https://doi.org/10.15217/issn1684-8853.2017.6.2 6. Roshan, R., Leary, J.: 802.11 Wireless LAN Fundamentals. Cisco Press (2004)
Using Families of Extremal Quasi-Orthogonal Matrices …
107
7. Balonin, N.A., Sergeev, M.B.: M-matrices. Informatsionno-upravliaiushchie sistemy [Information and Control Systems] 1(50), 14–21 (2011) 8. Balonin, N.A., Sergeev, M.B., Mironovsky, L.A.: Calculation of Hadamard-Mersenne matrices. Informatsionno-upravliaiushchie sistemy [Information and Control Systems] 5(60), 92–94 (2012) 9. Balonin, N.A., Sergeev, M.B., Mironovsky, L.A.: Calculation of Hadamard-Fermat matrices. Informatsionno-upravliaiushchie sistemy [Information and Control Systems] 6(61), 90–93 (2012) 10. Balonin, N.A., Sergeev, M.B.: Two ways to construct Hadamard-Euler matrices. Informatsionno-upravliaiushchie sistemy [Information and Control Systems] 1(62), 7–10 (2013) 11. Sergeev, A.M.: Generalized Mersenne matrices and Balonin’s conjecture. Autom. Control Comput. Sci. 48(4), 214–220 (2014) 12. Balonin, Y.N., Vostrikov, A.A., Sergeev, A.M., Egorova, I.S.: On relationships among quasiorthogonal matrices constructed on the known sequences of prime numbers. SPIIRAS Proc. 1(50), 209–223 (2017). https://doi.org/10.15622/SP.50.9 13. Nenashev, V.A., Sergeev, A.M., Kapranova, E.A.: Research and analysis of autocorrelation functions of code sequences formed on the basis of monocyclic quasi-orthogonal matrices. Informatsionno-upravliaiushchie sistemy [Information and Control Systems] 4(95), 9–14 (2018). https://doi.org/10.31799/1684-8853-2018-4-9-14 14. Sergeev, M.B., Nenashev, V.A., Sergeev, A.M.: Nested code sequences of Barker-MersenneRaghavarao. Informatsionno-upravliaiushchie sistemy [Information and Control Systems] 3(100), 71–81 (2019). https://doi.org/10.31799/1684-8853-2019-3-71-81 15. Sergeev, A., Nenashev, V., Vostrikov, A., Shepeta, A., Kurtyanik, D.: Discovering and analyzing binary codes based on monocyclic quasi-orthogonal matrices. Smart Innov. Syst. Technol. 143, 113–123 (2019). https://doi.org/10.1007/978-981-13-8303-8_10 16. Balonin, N.A., Vostrikov, A.A., Sergeev, M.B.: On Two predictors of calculable chains of quasi-orthogonal matrices. Autom. Control Comput. Sci. 49(3), 153–158 (2015). https://doi. org/10.3103/S0146411615030025 17. Balonin, N., Sergeev, M.: Quasi-orthogonal local maximum determinant matrices. Appl. Math. Sci. 9(8), 285–293 (2015). https://doi.org/10.12988/ams.2015.4111000 18. Balonin, N.A., Sergeev, M.B.: About matrices of the local maximum of determinant. In: Proceedings–2014 14th International Symposium on Problems of Redundancy in Information and Control Systems, REDUNDANCY 2014, vol. 14, pp. 26–29 (2015). https://doi.org/ 10.1109/RED.2014.7016698 19. Balonin, N.A., Balonin, Y.N., Dokovic, D.Z., Karbovskiy, D.A., Sergeev, M.B.: Construction of symmetric Hadamard matrices. Informatsionno-upravliaiushchie sistemy [Information and Control Systems] 25(90), 2–11 (2017). https://doi.org/10.15217/issn1684-8853.2017.5.2 20. Balonin, N.A., Sergeev, M.B.: Ryser’s conjecture expansion for bicirculant strictures and Hadamard matrix resolvability by double-border bicycle ornament. Informatsionnoupravliaiushchie sistemy [Information and Control Systems] 1(86), 2–10 (2017). https://doi. org/10.15217/issnl684-8853.2017.1.2 21. Balonin, N.A., Vostrikov, A.A., Sergeev, M.B.: Two-circulant golden ratio matrices. Informatsionno-upravliaiushchie sistemy [Information and Control Systems] 5(72), 5–11 (2014) 22. Balonin, N., Vostrikov, A., Sergeev, M.: Mersenne-Walsh matrices for image processing. Smart Innov. Syst. Technol. 40, 141–147 (2015). https://doi.org/10.1007/978-3-319-19830-9_13 23. Balonin, N., Sergeev, M.: Construction of transformation basis for video and image masking procedures. Front. Artif. Intell. Appl. 262, 462–467 (2014). https://doi.org/10.3233/978-161499-405-3-462 24. Mironovsky, L.A., Slaev, V.A.: Strip-method for image and signal transformation. De Gruyter (2011) 25. Barba, G.: Intorno al Teorema di Hadamard sui Determinanti a Valore Massimo. Giorn. Mat. Battaglini 71, 70–86 (1933)
108
A. Vostrikov et al.
26. Yagnyatinskiy, D.A., Fedoseyev, V.N.: Algorithm for sequential correction of wavefront aberrations with the criterion of focal spot size minimization. J. Opt. Technol. 86(1), 25–31 (2019). https://doi.org/10.1364/JOT.86.000260 27. Stepanyan, I.V.: Biomathematical system of the nucleic acids description. Comput. Res. Model. 12(2), 417–434 (2020). https://doi.org/10.20537/2076-7633-2020-12-2-417-434 28. Balonin, N., Sergeev, M., Petoukhov, S.: Development of matrix methods for genetic analysis and noise-immune coding. Adv. Artif. Syst. Med. Educ. III, 33–42 (2020). https://doi.org/10. 1007/978-3-030-39162-1_4
Variable Selection for Correlated High-Dimensional Data with Infrequent Categorical Variables: Based on Sparse Sample Regression and Anomaly Detection Technology Yuhei Kotsuka and Sumika Arima Abstract We devised a new variable selection method to extract infrequent event variables (binary data) that have one or more interaction effects, particularly for industrial application. The method is a combination of sparse sample regression (SSR), a latest type of just-in-time modeling (JIT), and T2 and Q statistics, which are typical anomaly detection methods. JIT is based on locally weighted regression to cope with changes in process characteristics as well as nonlinearity. In the proposed method, modeling is performed multiple times using variables whose effects on the objective variable are known. The sample to be used is selected by SSR automatically and weighted as well. By evaluating how far the estimated value of the coefficient of the created model is from the value of the reference model with T2 and Q statistics, models that contain more (or less) samples in which events that affect the objective variable occur are detected. Variable selection is performed by ranking the ratio of events to the sample used by the model detected as an abnormal value. Synthetic data was used to verify the method. In the verification, we succeeded in extracting one of the correct answers from a total of 5000 variables including six variables that are correct effects, and the summary of the verification with actual data was shown. Keywords Sparse sample regression · Just-in-time modeling · Anomaly detection · Hotelling-T2 statistic · Q statistic · Variable selection · Interaction effect
1 Introduction With the progress and contribution of measurement technology and the Internet, it has become possible to collect a wide variety of large-scale data. As a result, expectations for data utilization in various industries have increased [1, 2]. However, there are differences between industries in the development of data utilization, and Y. Kotsuka (B) · S. Arima University of Tsukuba, Tsukuba, Ibaraki pref. 3058573, Japan e-mail: [email protected] S. Arima e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_9
109
110
Y. Kotsuka and S. Arima
it has been pointed out that the factors are differences in equipment, culture, and needs. Knowledge related to data utilization may be utilized regardless of the type of industry. Therefore, it is said that the development of data utilization requires attention to all industries [3]. Also, as a global trend, it is mentioned that the era of the Fourth Industrial Revolution, which fuses the real world and the digital world, has arrived, and the existing manufacturing concept is about to change drastically [4]. So, the importance of data utilization will continue to increase in the future. As it became easier to collect data, it became more difficult to identify important elements from a large amount of information. Take the semiconductor manufacturing industry, which is known for its complicated and long-term manufacturing process, as an example. A factory can obtain about 2 billion data a day from a total of 20,000 processes [5]. In addition to such data collected from the sensors, the activities of the engineers (e.g., replacement of parts, change of settings, etc.), which are mainly collected as text information called “event variables”, are accumulated daily. Because of such a large amount of information, it is impossible to identify important elements based on human experience and intuition or to conduct a comprehensive search. Therefore, not only on-site knowledge but also data-driven knowledge extraction are needed [6]. Chapter 2 introduces the possibility and issues of the conventional method for large-scale interaction modeling. Next, in Chap. 3, we explain the concept, methods, and a procedure of the proposed method. Chapter 4 will show the verification results using synthetic data, which was carried out to confirm that the proposed method works. Chapters 5 and 6 are discussions and conclusions, respectively.
2 Conventional Method for High-Dimensional Data There are two main solutions for high-dimensional data. The first is dimensional reduction by principal component analysis (PCA), and the second is sparse modeling with regularization terms.
2.1 Dimensional Reduction by Principal Component Analysis The main methods using PCA are principal component regression (PCR) and partial least square (PLS). PCR creates a regression model that predicts a single objective variable from new explanatory factors calculated by converting the explanatory variables into principal component axes. PLS explores a linear regression model by projecting explanatory and objective variables into a new space and associating them with each other. In addition, Hotelling-T2 statistic, which is often used for anomaly detection, is also calculated based on the distance from the origin on the principal component space calculated by PCA. Hotelling-T2 statistics is explained in detail in Sect. 3.3.
Variable Selection for Correlated High-Dimensional Data …
111
While it is used in various fields, there are problems such as the difficulty of interpreting the principal components and the need to sacrifice a part of the information of the explanatory variables for dimensional compression.
2.2 Sparse Modeling with Regularization LASSO and elastic net. Sparse modeling aims to improve the interpretability and prediction accuracy of the model by adding a regularization term to the loss function [7]. The regularization term is a term introduced to prevent overfitting of the model, and the concept “norm” which generalizes the length is often used. L p (1 ≤ p ≤ ∞) norm for β = β 1 , β 2 , . . . , β n is defined by Eq. (1). By adding the regularization term of the L p norm to the loss function, the absolute value of the parameter and the number of non-zero parameters can be suppressed. As a result, the variance of the predicted value is reduced, leading to an improvement in the prediction accuracy. It also has the advantage of improving interpretability by reducing the variables used in the model. β p =
p
|β1 | p + |β2 | p · · · + |βn | p
(1)
The method of adding the L 1 norm regularization term to the loss function is called least absolute shrinkage and selection operator (LASSO) [8], and the method of adding both the L 1 and L 2 norm regularization terms to the loss function is called elastic net [9]. When the loss function is set to L (β), LASSO and elastic net are defined as optimization problems in Eq. (2) and Eq. (3), respectively.
min L (β) + λ||β||1
(2)
min L (β) + λ α||β||1 + (1 − α)||β||22
(3)
β
β
λ(≥ 0) is a regularization parameter that adjusts the strength of regularization. α(0 ≤ α ≤ 1) is a parameter that adjusts the ratio of the L 1 norm regularization term and the L 2 norm regularization term in the elastic net. In LASSO, variable selection is likely to occur due to the geometrical properties of the L 1 norm, so a simple model with high interpretability can be obtained. However, this property also means that important variables are likely to be dropped especially when there is a correlation between variables. On the other hand, elastic net uses not only the L 1 norm but also the L 2 norm, so it is possible to avoid excessive variable selection. However, since variables tend to remain, the original purpose of sparse modeling may not be satisfied. This means that even if there are truly important variables, they can be underestimated and overlooked if they correlate with other variables.
112
Y. Kotsuka and S. Arima
In an example of industrial application of LASSO to actual data in [10], 27 variables that affect quality have been successfully extracted from 23, 600 variables collected from sensors. However, due to the nature of LASSO, there may be variables that are overlooked because of correlation (many and diverse physical, chemical, or mechanical relationships in nature or in artificially implemented). The LASSO solution often does not include oracle variables adopting to domain knowledge in application to the industrial data because of the highly correlated variables. As pointed in previous study, only one of highly correlated variables is selected by LASSO [11]. This is the issue of true-positive. In addition, the number of variables selected by LASSO model is too many because its characteristics caused by the optimization problem of minimization of sum of a loss function (predictive error) and L1 norm penalty regulations [12]. This is high false positive (FP) problem. In addition, Reference [13] suggests the need to consider the effect of the event variable (categorical variables) on the objective variable, especially interaction effect between the event and the state of the process. From those background, modeling methods that take into account the interaction effect, called interaction modeling, have been proposed. Sprinter. Sparse reluctant interaction modeling (Sprinter) is one of the multi-step effect estimation methods in the interaction modeling [14]. In first step, only the main effect is considered to build a model that explains the objective variable. In the next step, based on the value of the correlation coefficient between the interaction term and the residual calculated from the first-order model, the variable set used for the second-order interaction model is narrowed down. Since Sprinter eliminates the need to use all explicitly created interaction terms in the model, it addresses the curse of dimensionality, a major problem in traditional interaction modeling. In third step, a final model is estimated by using all the main factors and interactions selected in first and second steps. Usually, LASSO is used in first and third steps. Especially, the method using LASSO for Sprinter is called Sprinter LASSO. Reference [15] is one of the few cases where Sprinter LASSO is applied to actual data. Three interactions were newly discovered from the approximately 200 million combinations of event variables (n = 8335) and variables collected from the device (n = 88). While the applicability of Sprinter was suggested, problems were also found. Sprinter still has two challenges, number of variables selected and difficulty to detect the interaction effect of event variables that occur relatively infrequently. Of course, if there are many variables to use, the candidates for interaction will increase exponentially. In Sprinter, the number of interaction terms to be added to the second stage model is given as a parameter. If this parameter is too small, important elements can be overlooked, while if it is too large, the second model will leave too many variables. It does not matter if the first stage model can be constructed with high accuracy; however, it is often difficult in actual data. From these problems, it can be said that there is a limit to preparing a large number of variables and performing interaction modeling, and it is necessary to estimate in advance the variables that may have an interaction effect. In particular, event variables are originally text data, and one-hot encoding is performed as a preprocessing for analysis. So, it is possible that correlation is more likely to occur than with ordinary
Variable Selection for Correlated High-Dimensional Data …
113
explanatory variables. If the conventional method is applied to such binary data, it will not function properly from the viewpoint described in the previous section. Therefore, we thought that new findings could be obtained by selecting variables using a different approach than before. Based on this background, we propose a new variable selection method for binary data.
3 Proposed Method 3.1 Basic Idea Based on the conventional method for high-dimensional data, we thought that any company or organization could get some clues about important indicators. So, the following two premises are made. (i) (ii)
It is assumed that some of the variables which affect the objective variable are already known. When multiple models are created by repeating random sampling with a fixed sample size, the coefficient distribution follows a normal distribution.
These premises are needed in order to utilize information already known when extracting new important variables. The basic idea of proposed method is as follows. (1) (2)
(3)
(4)
(5)
If event variable affecting objective variable occurs, the sample at that time contains more noise than usual. If a model that predicts the objective variable is constructed using many noisy samples, the estimated coefficient of the model (Model-1) will deviate from a model which used normal samples. Build a model using all the training data to determine the reference value of the coefficient (Model-2). The difference between the estimated coefficients of Model-1 and Model-2 should be large enough to be detected by anomaly detection. Of course, it is impossible to arbitrarily select only noisy samples, but by weighting samples, it is possible to get closer to the situation assumed in (2), (3). Since the event information is stored as text data, it is converted into binary data when performing analysis. Therefore, in the model detected in (4), the proportion of events should change significantly before and after weighting.
In other words, we thought that it would be possible to identify the variables that affect the objective variable by performing anomaly detection focusing on the coefficient value of the model and by analyzing the sample used by the model judged to be abnormal. So, the components of the proposed method are the following three. (a) (b)
How to select samples for model creation. How to weight samples used in the model.
114
(c)
Y. Kotsuka and S. Arima
How to detect anomalous models.
In the proposed method, sparse sample regression was adopted to perform (a) and (b), and Hotelling-T2 statistic and Q statistic were adopted to perform (c). An overview of each method is explained in next section.
3.2 Existing Methods that Compose Proposed Method Sparse Sample Regression. Sparse sample regression (SSR) is a latest type of just-intime modeling (JIT) which is based on locally weighted regression (LWR) to cope with changes in process characteristics as well as nonlinearity [16]. SSR applies an elastic net (EN) to select sparse samples improving LWR model for a target sample/query (as the objective variable) and to calculate the similarity for all explanatory variables as the coefficient of EN. Data-driven SSR improves previous LWR modeling in which the similarity is simply defined by Euclidean distance and a given threshold value is used to select samples for LWR. Though original SSR-JIT modeling uses PLS regression as LWR, in this study, we apply SSR only as a means of sample selection and weighing. Therefore, this section only describes how the SSR selects and weights samples. SSR calculates the similarity between one sample (query) and other samples by Eqs. (4) and (5). And the similarity obtained by SSR is transformed by Eq. (6) and used as the weight of any model. 2 β q = arg min x q − X T β
(4)
s.t.(1 − α)β1 + αβ2 ≤ γ
(5)
N w = β q N βq,i
(6)
β
i=1
T where X ∈ R N ×M is the explanatory variable matrix, β q = βq,1 , βq,2 . . . βq,N ∈ R N is the regression coefficient vector to predict query x q , and w ∈ R N is sample weight. As you can see from the Eqs. (4) and (5), SSR can be regarded as elastic net using the transposed sample matrix as an explanatory variable. Since the solution obtained by the elastic net is sparse, the coefficient is large for samples similar to the query and zero for samples that are not similar. In other words, SSR makes it possible to perform sample selection and weighting at the same time. In addition, elastic net uses the L2 norm as a regularization term, so it works well even when there is a correlation between the samples. Because of these advantages, SSR was adopted as part of the proposed method.
Variable Selection for Correlated High-Dimensional Data …
115
Hotelling-T2 statistic, Q statistic. Hotelling-T2 statistic [17] is defined by the following Eq. (7). It is a type of multivariate statistical process control based on principal component analysis (PCA-MSPC) and used for many applications. T2 =
R r =1
tr2 σt2r
(7)
Here, σtr is the standard deviation of the r th principal component score tr . Therefore, T2 -statistic can be thought as distance from the origin in the subspace introduced by the PCA. By calculating the T2 -statistic for each sample and setting control limit, it is possible to judge abnormal sample. The main theme of [17] was to use T2 statistics. However, twenty years later, a method is proposed for independently managing the subspace stretched by the principal components and their orthogonal complements [18, 19]. For the subspace by the principal component, the above management by the T2 statistic is effective, and in many cases, the principal component whose cumulative contribution rate of the eigenvalues of each principal component exceeds 0.95 for the first time is used. For orthogonal complement, management by Q statistic is effective. The Q statistic, also known as the squared prediction error (SPE), represents the amount of information that cannot be explained by T2 statistics. The Q statistic is defined by Eq. (8). Q=
P r=p
x p − xˆ p
2
(8)
Here, x p is the value of the pth variable, and xˆ p is the estimated value of x p using up to the Rth principal component calculated by PCA. As with the T2 statistic, a control limit based on the 3-σ method is set, and when either the T2 statistic or the Q statistic exceeds the control limit, it is judged as an abnormal sample.
3.3 Procedure of Proposed Method See Table 1. The flow of the proposed method is as follows. (1) (2)
(3)
Apply the least squares method to yand X database . Treat this model as a global model (for reference). Set k = 1. Randomly select k · p samples from y and X database . And apply the least squares method. Calculate the difference between the estimated coefficient and the global model. Repeat (2) 10,000 times, then add 1 to k. If k < 50, return to (2). Set i = 1.
116
Y. Kotsuka and S. Arima
Table 1 Notation and description of variables used in Chap. 3.3 Variables
Description
y ∈ Rn
Objective variable of samples which is used for model creation
X database ∈
Rn× p
Explanatory variables of samples which are used for model creation
X threshold ∈ Rm× p
Explanatory variables of samples which are used to calculate threshold of T2 and Q
X query ∈ Rl× p
Explanatory variables of samples which are used for anomaly detection
E ∈ Rn×H
Event variables of samples which are used for model creation
n SSR
Sample size selected by SSR
w∈
(4)
(5)
(6) (7)
(8)
Rn SSR
Weight of each sample selected by SSR
Calculate the similarity β q ∈ Rn between the ith(1 ≤ i ≤ m) sample X threshold,i ∈ R p and X database by SSR (Eqs. 4 and 5). Then, create two local models. (4-1)A model that applies least squares method to sample selected by SSR from X database . (4-2)A model that applies weighted least squares method using β q as sample weight (Eq. 6). Calculate the difference between the estimated values of the two models created in (4) and the global model. In addition, standardize according to the number of samples used in the model. At this time, the values calculated in (2) to (3) are used. Then, add 1 to i. If i < m, then return to (4). Set the thresholds for the T2 (Eq. 7) and Q (Eq. 8) using the features created in (4, 5). Set j = 1. Apply the same procedure as (4) and (5) to the jth(1 ≤ j ≤ l) sample X query, j ∈ R p , and calculate T2 (Eq. 7) and Q (Eq. 8). Then, add 1 to j. If j > l, go to the next step. The following steps are performed only for each sample that exceeds the threshold set in (6). Calculate the percentage of hth(1 ≤ h ≤ H ) event variable E h included in models (4-1). E unweighted,h =
(9)
E h,q n SSR
q=1
(9)
Calculate the percentage of E h included in models (4-2). E weighted,h =
(10)
n SSR
n SSR
wq E h,q n SSR
q=1
(10)
Create a ranking in descending order of Eq. (11) E weighted − E unweighted
(11)
Variable Selection for Correlated High-Dimensional Data …
(11)
117
In this study, the top 100th place in the ranking of Eq. (11) is defined as the top group. Aggregate the number of times each event variable entered the top group. If there is a variable that frequently enters the top group, it is extracted as E that may affect y.
4 Numerical Experiments 4.1 Data Description We applied the proposed method to synthetic data to confirm whether the variables can be actually extracted. The scale of the data and its generation method are as follows (Table 2). Note that the event variables are divided into three types based on the frequency of occurrence. In Eqs. (12) and (13), E ∗ , which has an interaction effect with X i to y, is written as E ∗,X i . And the notation of variable names on the synthetic data is written as “E_[frequency of occurrence (Mid or Low)]_[occurrence pattern (Random or Block)]_[impact on y(Change or Not Change)]_[serial number]”. So, in the case of E that occurs at medium frequency and at random and affects y, it is written as “E_M_R_Ch_1”. As a problem setting for analysis, we assumed that X i (i = 1 . . . 15), which affects y by itself, were already known and verified whether E mid,X i and E low,X i that has interaction effect could be detected through the proposed method. Here, the SSR parameter (α = γ = 0.05) was set so that the number of samples selected from the X database was at least about 10 times the number of explanatory variables used for model.
4.2 Results of Numerical Experiments Figure 1 is a scatter plot of T2 and Q statistics of X query . The dashed line represents the threshold calculated from X threshold , and 14 of X query were detected as outliers. Table 3 shows the difference from the global model. However, it is standardized according to the number of samples used in each model, and the absolute value that exceeds 2 is shown in red. For evaluation, the sample with the largest Q statistic ((a) in Table 3) is taken as an example. In Fig. 2, a bar graph shows how much the proportion of E, which is the correct answer, changed before and after weighting. The left (blue) shows the percentage of E in X database , the middle (orange) shows the percentage in the sample selected by SSR (Eq. 9), and the right (red) shows the percentage after weighting (Eq. 10). Other detected X query are shown in Table 4. Then, Eq. (11) was calculated for these 14 samples ((a)–(j)) to create a ranking. Table 5 is ranking of correct E sorted by Q statistic in descending order. It can be
Independently follow standard normal distribution
X database ∈ R20000×21
ε ∈ R20000
y∈
R20,000
E low ∈ R20,000×3500
Poisson distribution (λ = 4) 10 15 17 21 5 i=1 X i + 3 i=6 X i + 5 i=11 X i + 7 i=16 E mid,X i X i + 7 i=18 E low,X i X i + ε#(12) 17 21 N (0, V (yint )), yint = 7 i=16 E mid,X i X i + 7 i=18 E low,X i X i #(13)
Binominal distribution ( p = 0.3)
R20,000×1500
E mid ∈
Binominal distribution ( p = 0.7)
E high ∈ R20,000×500
X query ∈ R1000×21
X threshold ∈ R5000×21
Generation method
Variables
Table 2 Scale and generation method of synthetic data
118 Y. Kotsuka and S. Arima
Variable Selection for Correlated High-Dimensional Data … Fig. 1 T2 and Q statistic of each X query
Table 3 Difference from the global model standardized according to the number of samples
119
120
Y. Kotsuka and S. Arima
Fig. 2 Change in percentage of E before and after weighting (a)
seen that E_M_R_Ch2 has been ranked six times when the top 100 is used as the top criterion. On the other hand, low-frequency events (E_L_*) were rarely selected by SSR, and even if they were selected, they were less than 1% of the total, so they did not reach the top of the ranking. Therefore, we limited to E high , E mid and tabulated how many times they entered the top. The results are shown in Fig. 3. From Table 5 and Fig. 3, it can be seen that the only E that entered the top six times is E_M_R_Ch2 that has an interaction effect. From the above results, it can be said that the verification of the synthetic data succeeded in narrowing down at least one variable.
4.3 How to Judge the Application Result with Real Data Since there is no label showing the correct answer like the synthetic data in real data, a rationale is needed to determine whether the extracted variable affects the objective variable. Specifically, it is recommended to perform the following two operations and then make a final decision by comparing with the knowledge. Classify the samples by the extracted event variables (E = 0, E = 1) and (1) (2)
Test whether there is a significant difference in the mean or variance of the objective variable. Repeat random sampling and model creation, then test whether there is a significant difference in the mean or variance of the coefficient estimates.
As an example, some of the application results for real data collected in a manufacturing industry are shown below. However, from the perspective of confidentiality, specific names and numerical values are not disclosed (Figs. 4 and 5).
(b)
(c)
(d)
(e)
(f)
(g)
NA
(l)
NA
(k)
E_L_B_Ch2
0.330
0.308
(m)
NA
NA
NA
NA
0.392
0.280 0.365
0.284
(n)
NA
NA
NA
NA
0.355
0.347 0.297
0.320
(h)
NA
0.004
NA
NA
0.265
0.290
0.005
0.321
0.302
(i)
NA
NA
NA
NA
0.395
0.328
0.332
0.337
(j)
NA
NA
NA
NA
0.239
0.347
0.300
0.317
NA
NA
NA
NA
E_L_R_Ch2
E_L_B_Ch1
E_L_B_Ch2
0.291
E_M_R_Ch2 0.352
E_L_R_Ch1
0.271
E_M_R_Ch1 0.247
NA
NA
NA
NA
0.316
0.335
0.312
0.306
NA
NA
NA
NA
0.323
0.307 0.314
0.298
NA
NA
NA
NA
0.346
0.364 0.337
0.332
NA
NA
NA
NA
0.347
0.289
0.276
0.336
NA
NA
NA
NA
0.327
0.302
0.302
0.324
NA
NA
NA
0.002
0.339
0.377
0.006
0.313
0.331
Weighted Unweighted Weighted Unweighted Weighted Unweighted Weighted Unweighted Weighted Unweighted Weighted Unweighted Weighted Unweighted
NA
NA
0.005
E_L_B_Ch1
0.004
NA
E_L_R_Ch2
NA
0.399
0.340
NA
0.281
E_M_R_Ch2 0.247
E_L_R_Ch1
0.303
E_M_R_Ch1 0.365
Weighted Unweighted Weighted Unweighted Weighted Unweighted Weighted Unweighted Weighted Unweighted Weighted Unweighted Weighted Unweighted
(a)
Table 4 Percentage of E before and after weighting
Variable Selection for Correlated High-Dimensional Data … 121
122
Y. Kotsuka and S. Arima
Table 5 Ranking of correct E in each 14 of X query detected as outlier
Fig. 3 Number of times each E high , E mid was in the top 100 of the ranking
Fig. 4 Application result with real data: change in the mean of y
5 Discussion From Table 3, it can be seen that the estimated values of the local model detected through the proposed method deviate significantly from the global model. In particular, (a) and (f) whose T2 statistics exceeded the threshold included features that were 4 or more absolute values away from the global model. So, it can be said that
Variable Selection for Correlated High-Dimensional Data …
123
Fig. 5 Application result with real data: change in the mean of coefficient of X
T2 -statistics worked properly. On the other hand, X query close to the threshold of Q statistics ((n) in Table 3) showed a slightly smaller deviation from the global model than the others. In addition, of the 12 out of 14 cases exceeded only for the threshold of Q statistic, which accounted for the majority. From the above results, it is considered that the threshold value of the Q statistic was set smaller than the appropriate value. As a factor of that, there is a condition that is a prerequisite for abnormality detection. When setting the T2 and Q statistic threshold for anomaly detection performed for factory process control, it is assumed that all samples used contain average noise under normal circumstances. However, in this verification, the threshold is set using an arbitrary sample regardless of the total amount of noise (interaction effect and white noise) that cannot be explained by the main effect (X 1 . . . X 15 ). If many of the X threshold contain less noise than X query , the threshold may be underestimated, and more samples than necessary may be judged as outliers. It is considered that an appropriate threshold value can be calculated by using more samples for setting the threshold value or by applying the cross-validation method in the same manner as the verification of actual data. As shown in Table 5 (arranged in descending order of Q statistics) and Fig. 2, E_M_R_Ch2 ranked high in the ranking six times. As mentioned above, since the threshold of the Q statistic may be calculated low, it is considered that Fig. 2 can show the correct E more prominently by calculating an appropriate threshold. About (c) in Table 5, both event variables with interaction effects did not make it to the top of the ranking. From (c) of Table 4, it can be seen that E_M_R_Ch2 occurs in the sample that occupies about 39.2% of the local model. Compared to others, this number can be said to be a large value. Moreover, in (c) of Table 3, it can be read that the local model without weighting deviates from the global model. As the summary, it can be said that, while many samples containing the interaction effect were selected in (c), weighting was not performed so as to greatly increase or decrease the proportion of the corresponding event. This effect is reflected in the amount of change in the rate of events between local models and is considered to be a factor that did not make it to the top of the ranking. In order to properly evaluate
124
Y. Kotsuka and S. Arima
these cases, it is necessary to consider the proportion of events themselves, and there is room for improvement in the ranking calculation method.
6 Conclusion We devised a new variable selection method for the purpose of extracting event variables (binary data) that have an interaction effect. The method is a combination of sparse sample regression, a type of just-in-time modeling, and T2 and Q statistics, which are typical anomaly detection methods. In the proposed method, modeling is performed multiple times using variables whose effects on the objective variable are known. The sample to be used is selected by SSR and weighted as well. By evaluating how far the estimated value of the coefficient of the created model is from the value of the reference model with T2 and Q statistics, models that contain more (or less) samples in which events that affect the objective variable occur are detected. Variable selection is performed by ranking the ratio of events to the sample used by the model detected as an abnormal value. Synthetic data was used to verify the method. In the verification, we succeeded in extracting one of the correct answers from a total of 5000 variables including six variables that are correct answers, and the verification value with actual data was shown.
References 1. Nagata, K., Okada, M: Data-driven science by sparse modeling ( Data-centric science). Artif. intell. 30(2), 209–216 (2015, Japanese) 2. Higuchi, T.: Point of view. Fostering human resources who will be responsible for data-driven science and technology: stochastic thinking and reverse reasoning. Inf. Manag. 59(1), 53–56 (2016, Japanese) 3. Moyne, J., Iskandar, J.: Big data analytics for smart manufacturing: case studies in semiconductor manufacturing. Processes 5(39) (2017) 4. Okazaki, T.: Virtual Metrology technology in semiconductor manufacturing plants. Appl. Math. 29, 31–34 (2019, Japanese) 5. TOSHIBA HP: https://www.global.toshiba/jp/company/digitalsolution/articles/sat/1711_2. html, 2021/1/27 6. Kang, S., Kim, D., Cho, S.: Efficient feature selection-based on random forward search for virtual metrology modeling. IEEE Trans. Semicond. Manuf. 29(4), 391–398 (2016) 7. Choi, N.H., Li, W., Zhu, J.: Variable selection with the strong heredity constraint and its oracle property. J. Am. Stat. Assoc. 105, 354–364 (2010) 8. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58(1), 267–288 (1996) 9. Zou, H., Hastie, T.: Regulation and variable selection via the elastic net. J. R. Stat. Soc.: Ser. B. Methodol. 67(768), 301–320 (2005) 10. Takada, Saiki, Sueyoshi, Eguchi, Nishikawa: Intelligent causal analysis system for wafer quality control using sparse modeling. In: AEC/APC Symposium Asia 2017 (2017)
Variable Selection for Correlated High-Dimensional Data …
125
11. Zou, H., Hashite, T.: Regularization and variable selection via the elastic net. J. R. Statist. Soc. B 64(2), 301–320 (2005) 12. Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection with the lasso. Ann. Stat. 34(3), 1436–1462 (2006) 13. Arima, S., Sumita, U., & Yoshii, J.: Development of sequential association rules for preventing minor-stoppages in semi-conductor manufacturing. In: ICORES, pp. 349–354 (2012) 14. Yu, G., Bien, J., Tibshirani, R.: Reluctant interaction modeling. arXiv preprint arXiv:1907. 08414. (2019) 15. Nagata, T.: Interaction modeling of high-dimensional data for the purpose of identifying improvement events that affect product quality. Graduate school of Tsukuba in System and Information Engineering (2020, Japanese, Unpublished) 16. Uchimura, T., Kano, M.: Sparse sample regression based just-in-time modeling (SSR-JIT): beyond locally weighted approach. Int. Fed. Autom. Control 49(7), 502–507 (2016) 17. Jackson, J.E.: Quality control methods for several related variables. Technometrics (1), 359–377 (1959) 18. Jackson, J.E, Mudholkar, G.S.: Control procedures for residuals associated with principal component analysis. Technometrics (21), 341–349 (1979) 19. Jackson, J.E.: Principal components and factor analysis: Part 1 Principal components. J. of Technometrics 12, 201–213 (1980)
Verification of the Compromise Effect’s Suitability Based on Product Features of Automobiles Takumi Kato
Abstract Compromise effect is a typical marketing measure that induces consumer cognitive bias. When choices are presented in three grades, consumers tend to choose the middle grade. However, in existing research, there are few examples that focus on the product features of high value-added and low-priced products within the same manufacturer’s brand, as conditions for the appearance of compromise effects. This study examined web quotations in the Japanese automobile industry for differences in the compromise effect due to product features. As a result, it became clear that the compromise effect appears in low-priced products such as microcars and compact cars, whereas for high value-added products such as SUVs and minivans, the highest grade is likely to be selected. Compromise choices were also likely to occur in the answers given using smartphones, but no compromises were seen when using personal computers. During the survey, respondents’ power of concentration was less in the case of smartphones probably due to its small screen and because they tended to operate it while moving. Therefore, as a decoy for low-priced products, it is effective to set the upper and lower grades. Conversely, high value-added products should be designed with the highest grade as the main product. Keywords Context effect · Product grade · Online survey · Response device · Japan
1 Introduction The context effect explains that consumers’ purchasing choices change due to psychological influences from situational factors. This effect refers to a phenomenon in which the preference order between options changes, depending on the other options. A famous example is the compromise effect. If there are three options, one of which has the characteristics of the other two options, the consumer compromises and chooses the one in the middle [1]. The reason for choosing the middle option is T. Kato (B) Saitama University, 255 Shimo-Okubo, Sakura-ku, Saitama, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_10
127
128
T. Kato
that it is easy to justify, less likely to be criticized, and avoids losses. It is used in various industries, such as restaurants’ seat tickets, travel package options, and car grades. It is noteworthy that although consumers were not attracted to the middle choice, the higher and lower grades became decoys, encouraging them to choose the middle. Therefore, consumers who choose B for A/B/C may also choose C for B/C. The compromise effect has been proven to exist in many product categories, and the conditions under which it appears are being actively discussed. However, in existing research, there are few examples that focus on the product features of high value-added and low-priced products within the same manufacturer’s brand, as a condition for the appearance of the compromise effect. This study verified the suitability of the compromise effect by focusing on the aforementioned product features in the Japanese automobile industry. Marketing can be said to be a context setting task because it is possible to acquire the superiority of the company’s products at a low cost by guiding the consumer’s perception by the context. This study will provide suggestions for setting of grades in the proper context.
2 Previous Research on Context Effect and Hypothesis Development The context effect refers to a phenomenon in which the preference order between options changes depending on the other options. Three typical examples of the context effect are as follows: (i) the attraction effect, a phenomenon in which the target product’s selection probability increases due to the existence of its inferior version (decoy) [2], (ii) the phantom effect, a phenomenon in which the probability of selecting a target product increases when it is difficult to obtain its superior version [3], and (iii) the compromise effect [1], a phenomenon in which the intermediate option is selected most often, when a product is located between the two products introduced. This study focuses on the compromise effect, which has been reported in various experiments such as restaurant menus [4], electric toothbrushes [5], and MP3 players [6]. A meta-analysis of 142 studies also confirmed this effect [7]. In addition, the conditions under which this effect appears have been discussed from various perspectives. The compromise effect is stronger when the product information is written graphically rather than numerically [8]. In brand-related information, the compromise effect is greater when it is displayed in its own color compared to a black font on a white background [9]. Consumers prefer extreme brands when compromise brands are relatively less familiar and when compromise brands are relatively more familiar [10]. The compromise effect is greater when choosing souvenirs for relatives or purchasing for others [11]. For verification in an environment, where actual payment is required, the compromise effect can be confirmed, but its degree is weakened [12]. The larger the consideration set size, the weaker the compromise effect [13].
Verification of the Compromise Effect’s Suitability Based …
129
However, in existing research, there are few examples that focus on the product features of high value-added and low-priced products within the same brand, as conditions for the compromise effect to appear. In the lineup of durable consumer goods brands, covering from the high-price to the low-price range is common. Especially in the automobile industry, consumer demands tend to differ depending on the car body type. In general, sport utility vehicles (SUVs) and minivans are oriented toward high added value such as design, whereas microcars and compact cars are oriented toward being low-priced, such as fuel efficiency. Cars are expensive and long-term products. In Japan, consumers with a mean replacement period of 7.5 years and an ownership period of 10 years or more account for 25% [14], and it is not easy to repurchase after purchase. Therefore, it was inferred that consumers seeking low-priced cars make a compromise choice, while those seeking high value-added cars choose the highest grade. In this study, the following two hypotheses were developed for the automobile industry. H1: A compromise effect appears when selecting a low-priced car, making it easier to select a middle grade. H2: No compromise effect appears when selecting a high value-added car, making it easier to select the highest grade. Moreover, as mentioned above, it is known that the degree of compromise effect changes depending on the consumer environment. This study also focused on response devices in online surveys. Compared to personal computers (PCs), smartphones (SPs) have a lower response completion rate [15], take longer to respond [16], and have larger measurement errors [17]. In other words, compared to PCs that have a large screen and are calmly used at home or at work, SPs which have smaller screens and could be used while in moving trains or cars are considered to promote less concentration. Hence, the following hypothesis was developed for the answering device. H3-1: When answering using PCs, the compromise effect appears when selecting a low-priced car, making it easier to select a middle grade. H3-2: When answering using PCs, no compromise effect appears when selecting a high value-added car, making it easier to select the highest grade. H4: When answering using SPs, the compromise effect appears regardless of product features.
3 Research Methodology This study’s purpose is to verify the four hypotheses about the compromise effect shown in the previous section for the Japanese automobile industry. An experiment was conducted to observe consumer behavior in a survey environment using the estimation function of the manufacturer’s Web site. Since cars are expensive products, it is common for consumers to check the quotation on the web before visiting a dealer. For excluding the influence of the brand image and Web site specifications,
130
T. Kato
the number of manufacturer brands was narrowed down to one. All the target cars— NBOX (micro), FIT (compact), VEZEL (SUV), and STEPWGN (minivan)—were products of Honda Motor Co., Ltd. The online survey conducted in Japan during May 2019 covered 400 respondents in their 20s to 60s, who met the following conditions: (a) owned a car purchased as a new product, (b) drove at least once a month, (c) were involved in making decisions when buying a car, and (d) were intending to purchase the target cars within a year. The online surveys consisted of a screening survey and a main survey, and the flow involved transiting to the main survey immediately after the screening survey. The screening survey’s purpose was to extract respondents, who met the aforementioned (a)–(d) conditions, and the following items were considered: gender, age, area of residence, annual household income, car ownership, drive frequency, involvement in purchase decisions, manufacturer brand, intent to purchase a car within a year, and emphasis points for the next purchase. These distributions are shown in Table 1. Looking at the value obtained by subtracting the total mean from the mean of each car, NBOX and FIT emphasize low price and fuel efficiency, whereas VEZEL emphasizes design and driving performance, and STEPWGN design and usability. Therefore, microcars and compact cars are low-priced, and SUVs and minivans are high value-added oriented. In the main survey, a screen replicating the estimation page on the Web site of the automobile brand was presented, and the behavior was observed. The target of the estimate was the car intended for purchase that was asked in the screening survey. As shown in Fig. 1, there were four estimation items: grade, engine, exterior color, and interior color. Consideration was given to individually selecting not only the grade but also the engine and color that were likely to show consumer taste. There were three options for each item. As shown in Table 2, the price depended only on the grade and was set according to the price range of each car. As shown in Table 3, the difference between the grades represented the number of functions installed in each of the six function categories (only fuel economy had different performance values). After defining 6 functions in each category, the order was decided by randomization, and B was mechanically determined as the top 2, A as the top 4, and S as all. Actually, there are cases where the exterior design differs depending on the grade. However, it is difficult to objectively verify emotional elements such as design because different people interpret the same information differently. Therefore, priority was given to clarifying the order of options, and the differences between the grades were limited to the number of functions. The design and usability were also expressed by the number of corresponding functions. On the survey screen, as shown in the upper left of Fig. 1, a button called “Detail” is placed, and the grade cannot be selected until the details shown in Table 3 were confirmed by each respondent. The devices used by the respondents were controlled based on a 1:1 ratio between PCs and SPs. Figure 2 shows the difference in screens between PCs and SPs. In the verification, the chi-square test was applied in a car x selection grade matrix. The null hypothesis is that there is no difference in the distribution of grade selections. The significance level was set to 5%. When a significant difference was confirmed, Steel’s multiple comparison test was applied with the cheapest NBOX as
Verification of the Compromise Effect’s Suitability Based …
131
Table 1 Deviation from the overall respondent attributes of each car Variable
Description
Total Mean
Difference from Total (Total - each car) NBOX
Gender
Female
0.308
0.172
Age
20s
0.068
30s
0.215
40s
Areas
Income
Emphasis point
FIT
VEZEL
STEPWGN
0.022
-0.148
-0.048
0.032
0.002
-0.018
-0.018
0.005
-0.065
0.025
0.035
0.258
-0.088
-0.028
0.032
0.082
50s
0.315
-0.035
0.095
-0.045
-0.015
60s
0.145
0.085
-0.005
0.005
-0.085
Hokkaido
0.050
-0.03
-0.02
0.04
0.01
Tohoku
0.048
0.002
-0.018
0.012
0.002
Hokuriku
0.060
0.01
0.01
-0.01
-0.01
Kanto
0.375
-0.155
0.055
0.075
0.025
Chubu
0.108
0.002
-0.008
-0.038
0.042
Kansai
0.178
0.062
-0.038
-0.048
0.022
Chugoku
0.058
0.032
0.012
-0.018
-0.028
Shikoku
0.032
-0.002
0.018
0.018
-0.032
Kyusyu
0.092
0.078
-0.012
-0.032
-0.032
Two to four million yen
0.128
0.102
-0.038
-0.018
-0.048
Four to six million yen
0.288
0.072
-0.028
-0.038
-0.008
Six to eight million yen
0.272
-0.092
-0.002
0.068
0.028
Eight to 10 million yen
0.178
-0.048
0.012
0.002
0.032
10 million to 15 million yen
0.135
-0.035
0.055
-0.015
-0.005
Brand
0.065
-0.035
-0.005
0.025
0.015
Design
0.130
-0.02
-0.07
0.05
0.04
Usability
0.232
-0.012
-0.052
-0.042
0.108
Driving
0.092
-0.032
-0.042
0.088
-0.012
Fuel economy
0.145
0.025
0.095
-0.005
-0.115
Safety
0.102
-0.012
0.008
-0.002
0.008
Price
0.212
0.098
0.078
-0.102
-0.072
Wom
0.020
-0.01
-0.01
-0.01
0.03
132
T. Kato
Fig. 1 Estimating steps
Table 2 Price by grade of each car Grade
NBOX
FIT
VEZEL
STEPWGN
S
225
250
275
300
A
175
200
225
250
B
100
125
150
175
the control group and the other 3 cars as the treatment group, to identify the location of the difference. All data was used for H1 and H2, and the response data for the corresponding device was used for H3-1, H3-2, and H4.
4 Verification Results and Discussion As shown in Table 4, when looking at each group’s grade selection distribution, Grade A was selected most for NBOX and FIT, whereas Grade S was selected most for STEPWGN and VEZEL. As a result of the chi-square test, p-value = 0.000, the null hypothesis was rejected, and it was confirmed that there was a significant difference in the distribution of grade selections. As shown in Table 5, as a result of Steel’s test, there was no difference in the same low-priced FIT and NBOX,
Verification of the Compromise Effect’s Suitability Based …
133
Table 3 Definition of installed functions by grade Category
Description
Fuel economy (km/L) Safety
Audio/navigation
Usability
Advanced technology
Decoration
Grade S
A
B 25
38
30
Side airbag
◯
◯
◯
Blind spot danger detection
◯
◯
◯
Road sign recognition
◯
◯
Parking assist
◯
◯
Collision prevention brake
◯
Emergency call center
◯
Rear camera
◯
◯
◯
Restaurant recommendations
◯
◯
◯
Rear seat screen
◯
◯
Large screen navigation
◯
◯
Remote update
◯
High-quality audio
◯
UV-cut glass
◯
◯
◯
Seat heater
◯
◯
◯
Automatic setting of seat position
◯
◯
Dual air conditioner
◯
◯
Air conditioner with air purifier
◯
Hands-free door
◯
Providing car information to smartphones
◯
◯
◯
Cordless smartphone charging
◯
◯
◯
Voice recognition
◯
◯
Apple CarPlay/Android auto
◯
◯
In-car Wi-Fi
◯
Streaming (drama/movie)
◯
Aluminum wheel
◯
◯
◯
Front bumper garnish
◯
◯
◯
LED headlights
◯
◯
Door handle garnish
◯
◯
Exterior body coating
◯
High definition LCD meter
◯
134
T. Kato
Fig. 2 PCs and SPs estimate screen Table 4 Results of chi-square test Car
Grade S
Grade A
Grade B
Total
p-value
All
NBOX FIT STEPWGN VEZEL
26 31 48 54
55 48 43 41
19 21 9 5
100 100 100 100
0.000***
PC
NBOX FIT STEPWGN VEZEL
14 16 27 31
28 22 20 17
8 12 3 2
50 50 50 50
0.000***
SP
NBOX FIT STEPWGN VEZEL
12 15 21 23
27 26 23 24
11 9 6 3
50 50 50 50
0.129
Note ***p < 0.001; **p < 0.05; *p < 0.05 Table 5 Results of Steel’s multiple comparison test Control
Treatment
p-value All
PC
SP
NBOX
FIT
0.973
0.992
0.805
NBOX
STEPWGN
0.002*
0.016*
0.105*
NBOX
VEZEL
0.000***
0.001**
0.015*
Note ***p < 0.001; **p < 0.05; *p < 0.05
Verification of the Compromise Effect’s Suitability Based …
135
but there was a significant difference between the high value-oriented VEZEL and STEPWGN. From the above, H1 and H2 were supported. Next, the effect of the device was confirmed. Looking at the grade selection results shown in Table 4, the same tendency can be seen on PCs, as a total, but using SPs, Grade A was the most selected for all cars. As a result of the chi-square test, there was a significant difference in PCs, but no difference was confirmed in SPs. As shown in Table 5, the Steel’s test results were the same on PCs, as a total. On SPs, although there was a significant difference between NBOX and VEZEL, Grade A was the most selected in VEZEL, as well. From the above, H3-1, H3-2, and H4 were supported. As already mentioned [16], the response time was longer for SPs (93.745 s) than for PCs (mean 90.670 s). Therefore, for low-priced cars, it is effective to set the upper and lower grades as a decoy, whereas for high value-added products, the highest grade should be the main option. If the context setting is incorrect, there is a risk that companies will encourage consumers to make compromised choices in high-value products. This study has the following limitations. First, it did not take into account past purchasing experience, although when purchasing multiple products in a series of steps, past choices affect the next choice [18]. However, since cars have a long replacement period, it is highly possible that the memory of past purchase selections is ambiguous. Second, the influence of the word of mouth (WoM) was not taken into consideration. It has been pointed out that the abundance of WoM these days makes it easier for consumers to make rational choices and reduces the compromise effect [19]. There are many WoM car Web sites, and it has already been confirmed that WoM has a greater influence than web advertising [20]. However, since the grade of the car was the subject of this study, it was difficult to quickly compare the superiority of each car by grade. Third, the observed consumer behavior was limited to the environment of online survey. Bias of respondents who participated in the survey for the purpose of incentives has been pointed out [21–23]. These are future research topics.
5 Conclusion Marketing can be considered context setting work because the superiority of products can be secured at a low cost by using context to guide consumer perception. An example of corporate marketing measure that induces consumer cognitive bias is the compromise effect, which refers to selecting the middle option by making a compromise among the options provided in three grades. This characteristic is used in various industries such as food service and automobiles. However, in existing research, there are few examples that focus on the product features of high valueadded and low-priced products within the same manufacturer’s brand, as a condition for the appearance of the compromise effect. This study verified the suitability of the compromise effect by focusing on the product features in the Japanese automobile industry. As a result of verifying the web
136
T. Kato
estimation of cars, it was clarified that the compromise effect appears for low-priced products, whereas no compromise effect appears for high value-added products, as well as confirmed the tendency of choosing the highest grade. For high-priced and long-term cars, low-priced consumers may have made a safe compromise, but high value-oriented consumers may have chosen the higher grade so that they would not regret it later. In the online survey, the influence of the responding terminal was a matter of concern. As compared to PCs, it is more difficult to focus on answering questions using SPs, which have a small screen, especially while traveling by trains, etc., and compromise choices are more likely to occur. It is necessary to pay attention to this point when looking at the survey’s results. As shown in this study, understanding the product features and setting the appropriate grade is an important context.
References 1. Simonson, I.: Choice based on reasons: the case of attraction and compromise effects. J. Consum. Res. 16(2), 158–174 (1989) 2. Huber, J., Payne, J.W., Puto, C.: Adding asymmetrically dominated alternatives: violations of regularity and the similarity hypothesis. J. Consum. Res. 9(1), 90–98 (1982) 3. Pratkanis, A.R., Farquhar, P.H.: A brief history of research on phantom alternatives: evidence for seven empirical generalizations about phantoms. Basic Appl. Soc. Psychol. 13(1), 103–122 (1992) 4. Pinger, P., Ruhmer-Krell, I., Schumacher, H.: The compromise effect in action: lessons from a restaurant’s menu. J. Econ. Behav. Organ. 128, 14–34 (2016) 5. Lichters, M., Müller, H., Sarstedt, M., Vogt, B.: How durable are compromise effects? J. Bus. Res. 69(10), 4056–4064 (2016) 6. Munro, A., Popov, D.: A portmanteau experiment on the relevance of individual decision anomalies for households. Exp. Econ. 16(3), 335–348 (2013) 7. Neumann, N., Böckenholt, U., Sinha, A.: A meta-analysis of extremeness aversion. J. Consum. Psychol. 26(2), 193–212 (2016) 8. Kim, J.: The influence of graphical versus numerical information representation modes on the compromise effect. Mark. Lett. 28(3), 397–409 (2017) 9. Kim, J., Spence, M.T., Marshall, R.: The color of choice: the influence of presenting product information in color on the compromise effect. J. Retail. 94(2), 167–185 (2018) 10. Sinn, F., Milberg, S.J., Epstein, L.D., Goodstein, R.C.: Compromising the compromise effect: brands matter. Mark. Lett. 18(4), 223–236 (2007) 11. Chang, C.C., Chuang, S.C., Cheng, Y.H., Huang, T.Y.: The compromise effect in choosing for others. J. Behav. Decis. Mak. 25(2), 109–122 (2012) 12. Müller, H., Kroll, E.B., Vogt, B.: Do real payments really matter? A re-examination of the compromise effect in hypothetical and binding choice settings. Mark. Lett. 23(1), 73–92 (2012) 13. Yoo, J., Park, H., Kim, W.: Compromise effect and consideration set size in consumer decisionmaking. Appl. Econ. Lett. 25(8), 513–517 (2018) 14. Japan Automobile Manufacturers Association. Passenger car market trends in Japan: summary of results of JAMA’s fiscal 2015 survey (2016). http://www.jama-english.jp/release/release/ 2016/160426-1.html. Last accessed 2020/12/1 15. Mavletova, A.: Data quality in PC and mobile web surveys. Soc. Sci. Comput. Rev. 31(6), 725–743 (2013) 16. Couper, M.P., Peterson, G.J.: Why do web surveys take longer on smartphones? Soc. Sci. Comput. Rev. 35(3), 357–377 (2017)
Verification of the Compromise Effect’s Suitability Based …
137
17. Lugtig, P., Toepoel, V.: The use of PCs, smartphones, and tablets in a probability-based panel survey: effects on survey measurement error. Soc. Sci. Comput. Rev. 34(1), 78–94 (2016) 18. Dhar, R., Simonson, I.: Making complementary choices in consumption episodes: highlighting versus balancing. J. Mark. Res. 36(1), 29–44 (1999) 19. Simonson, I., Rosen, E.: Absolute Value: What Really Influences Customers in the Age of (Nearly) Perfect Information. Harper Business, New York (2014) 20. Kato, T., Tsuda, K.: Contribution to purchase behavior of voluntary search compared to web advertisement. Procedia Comput. Sci. 126, 1329–1335 (2018) 21. Cobanoglu, C., Cobanoglu, N.: The effect of incentives in web surveys: application and ethical considerations. Int. J. Mark. Res. 45(4), 1–13 (2003) 22. Laguilles, J.S., Williams, E.A., Saunders, D.B.: Can lottery incentives boost web survey response rates? Findings from four experiments. Res. High. Educ. 52(5), 537–553 (2011) 23. Kato, T., Kishida, N., Umeyama, T., Jin, Y., Tsuda, K.: A random extraction method with high market representation for online surveys. Int. J. Bus. Innov. Res. 22(4), 569–584 (2020)
Advances in Intelligent Data Processing and Its Applications
Application of Granular Computing-Based Pre-processing in the Labelling of Phonemes Negin Ashrafi and Sheela Ramanna
Abstract Machine learning algorithms are increasingly effective in algorithmic viseme recognition which is a main component of audio-visual speech recognition (AVSR). A viseme is the smallest recognizable unit correlated with a particular realization of a given phoneme. Labelling of phonemes and assigning them to viseme classes is a challenging problem in AVSR. In this paper, we present preliminary results of applying rough sets in pre-processing video frames (with lip markers) of spoken corpus in an effort to label the phonemes spoken by the speakers. The problem addressed here is to detect and remove frames in which the shape of the lips do not represent a phoneme completely. Our results demonstrate that the silhouette score improves with rough set-based pre-processing using the unsupervised K-means clustering method. In addition, an unsupervised CNN model for feature extraction was used as input to the K-means clustering method. The results show promise in the application of a granular computing method for pre-processing large audio-video datasets. Keywords Viseme · Audio-visual speech recognition · Granular computing · Rough set theory · Unsupervised learning
1 Introduction Motion capture systems are used in areas such as medicine for non-invasive diagnosis, and animation of characters in films to name a few. A survey of vision-based human motion capture and analysis can be found in [1]. Combining sound analysis and video analysis in a multimodal approach can result in increased accuracy sentiment classification which is personalized for a specific speaker. The analysis http://www.modality-corpus.org/. N. Ashrafi · S. Ramanna (B) Department of Applied Computer Science, University of Winnipeg, MB R3B 2E9, Winnipeg, Canada e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_11
141
142 Table 1 Viseme classes Class 1 2 3 4 5 6 7 8 9 10 11 12
N. Ashrafi and S. Ramanna
Group of viseme
Phoneme
LABIAL ALVEOLAR VELAR LABIODENTAL PALATO-ALVEOLAR DENTAL SPREAD OPEN-SPREAD NEUTRAL ROUNDED PROTRUDING-ROUNDED SILENCE
p, b, m t, d, n, s, z, l, r k, g, η, ì f, v S, Z , t S, d Z θ, D i:, I, e, i, eI , j æ A:, a, 2, @, h u :, O:, 6, U, O I , w 3: Closed
of visual signals focuses on units such as phonemes and visemes, isolated words, sequences of words and continuous/spontaneous speech [2]. A viseme is the smallest recognizable unit correlated with a particular realization of a given phoneme. A phoneme is the smallest unit of sound that distinguishes one word from another word in a language. Machine learning algorithms are increasingly effective in algorithmic viseme recognition which is a main component of audio-visual speech recognition (AVSR). In [2], a repository of speech and facial expressions was created by face motion capture technology with reflective markers. In [3], a connectionist hidden Markov model (HMM) system for noise-robust audio-visual speech recognition is presented. In [4], new multimodal English speech corpus was created to aid in audiovisual speech recognition. In [5], a large visual speech recognition dataset, consisting of pairs of text and video clips of faces and a scalable model for producing phoneme and word sequences from processed video clips is presented. Emotion recognition from speech and audio signals were discussed in [6–8]. One of the challenging issues is that certain phonemes may have the same visual representations. The phoneme to viseme mapping and viseme classes are given in Table 1 [9]. The shape of the lip of a speaker is defined by a set of markers (points) which are located on characteristic spots of the lip, placed at its edges or inside it, as shown in Fig. 1 by applying the active appearance model [10]. These markers make it possible to extract crucial information about lip movements in audio-visual data. In Fig. 2, the detection of the shape (object) shows the separation of the object from its boundary. Usually, there is uncertainty and insufficient contrast in the boundaries of the object to classify the pixels as either belonging to the object or to the background class. Rough set theory is one of the foundational theories that constitutes granular computing [11]. A granule is a clump of objects (points), in the universe of discourse, drawn together by indistinguishability, similarity, proximity, or functionality [12].
Application of Granular Computing-Based …
143
Fig. 1 Example: silence frame
Fig. 2 Example: object detection
In [13], a pseudo-probabilistic approach based on maximum likelihood to image granulation based on geometric granules is proposed. Object extraction with granulation concepts and rough set theory was introduced in [14]. Several papers related to granular techniques using rough, fuzzy and near sets in image analysis can be found in Pal and Peters [15]. In [16], spatio-temporal segmentation using rough entropy was used for object detection and tracking in videos. In [17], an unsupervised rough clustering method was used to detect changes in multi-temporal images of different types of images such as satellite and medical images. In [18], granulated deep learning is used to classify static and moving objects by using a representative pixel of each granule and applying Z-numbers to analyze scenes. In this paper, we present results from Alofon corpus that include: (i) preprocessing of videos by using a specific granular computing method (rough sets), (ii) mechanisms for capturing the movement of face muscles and lips to understand the phonemes spoken by the speakers, and (iii) applying K-means for classifying phonemes. Figure 3 illustrates the process flow of the proposed granular pre-processing technique for labelling phonemes used in this work. Broadly, the process includes change detection in consecutive frames using rough set approximation operators to remove images that do not contribute to the detection of the shape of the lip. The final output is the labelling of the phonemes using an unsupervised machine learning algorithm. In Sect. 2, we give formal definitions for the rough set model used in this work. In Sect. 3, we give details of the pre-processing experiments with rough sets on the video frames. In Sect. 4, we discuss results of using unsupervised k-means and CNN methods on the extracted frames.
144
N. Ashrafi and S. Ramanna
Fig. 3 Main steps of the proposed methodology
2 Preliminaries In this paper, rough set theory is used as a pre-processing technique to clean and remove outliers from audio-visual and face motion capture data. This technique is based on an approach discussed in [14]. Formation and extraction of granules in the image can be based on the pixel values, colour similarity, spatio similarity, or the proximity and neighbourhood. The size and shape of the windows can vary depending on the gray scale image and can be categorized as follows [18]: equal-sized and shaped granules, unequal-sized and regular-shaped granules or arbitrary-shaped granules. An approach has been proposed in [14] to obtain the optimum granule size by getting minimum base width in the histogram of the image and dividing the base by two. Let (I, A) be the approximate space of our discourse where I is an image of size P ∗ Q. In other words, our universe is a collection of pixels. Rough set theory facilitates the partitioning of the image into non-overlapping windows with equal or unequal sizes of pi ∗ qi where i represents the ith granule. In addition, we are able to approximate the objects in I using the two well-known operators in rough set theory: upper and lower approximations. Briefly, these two operators make it possible to reason about constrained distinguishability of objects in an image. In this work, we are interested in separating objects from the background by applying the definitions from [18] using a threshold T . The definitions for lower and upper approximation operators for both object and background are as follows:
Application of Granular Computing-Based …
145
• Lower approximation of an object OT ∗ represents the granule set where all of the pixel values in G i > T • Upper approximation of an object OT∗ represents the granule set where at least one pixel value in G i > T • Lower approximation of background BT ∗ represents the granule set where all of the pixel values in G i < T • Upper approximation of background BT∗ represents the granule set where at least one pixel value in G i < T Based on the classical rough set theory definitions of accuracy and roughness, the object and background roughness and accuracy measures are calculated as follows: Definition 1 (Accuracy of Object OT ) ||OOT∗∗|| T Definition 2 (Accuracy of Background BT ) ||BBT∗∗|| T Definition 3 (Roughness of Object OT ) 1 − ||OOT∗∗|| T Definition 4 (Roughness of Background BT ) 1 − ||BBT∗∗|| T In this work, we have implemented equal size and shape granulation in order to find the upper and lower approximations of the lips (object). Considering various values for threshold T for the purpose of separation and calculating accuracy and roughness for object and background, we select the threshold T which results in minimum roughness or maximum accuracy for the granulated image [18].
3 Experiments The flowchart for our methodology is given in Fig. 3. The following section gives experimental details.
3.1 AudioVisual Dataset The Alofon corpus1 consists of seven speakers. Each speaker has 11 audio files, 11 face motion capture files, and 22 video files (11 captured with Canon camera and 11with Vicon camera with different qualities). Each video file is approximately between one to two minutes with a dimension of 720 height and 1280 width. The videos have been recorded with 50 frames per second (FPS). For this work, a subset of the corpus consisting of more than 20,000 images belonging to 1 speaker and 5 videos were used. 1 http://www.modality-corpus.org/.
146
N. Ashrafi and S. Ramanna
In the first phase of the experiment, videos have been converted to frames with the default frame rate per second of the videos using skvideo which is a useful preprocessing tool in Python. In the second phase of the experiment, the face and mouth area of each speaker have been detected using the landmarks around the speaker’s lips. Each viseme can represent a few phonemes. Table 1 represents the number of classes, visemes, and their corresponding phonemes. The conversion of videos to frames leave us with a large number of images where there is either no change or very slight change in the successive frames due to the high rate of frames per second. One major problem is the presence of frames in which the shape of the lips do not represent a phoneme completely, and, also happens to be a part of the frameset for pronouncing two consecutive visemes. The objective of this research is to use rough sets for removing such frames from the dataset of images that do not contribute to the detection of phonemes.
3.2 Pre-processing with Rough Sets The separation of object and background was made possible by maximizing accuracy values and minimizing roughness values using Def. 1, 2, 3, and 4 using the approximation operators and experimenting with different values for threshold T . As can be seen in Fig. 1, a thresholding method is required to handle the bright marks around the mouth. In the third phase of the experiment, we have used equal size and regular-shaped granules. The windows are neighbouring frames that cover the whole pixels in the image and there is no overlap amongst the windows. The number of pixels in the lower and upper approximations of the object for each frame is calculated. The assumption is that by uttering various phonemes the associated viseme will change and in each alteration as the lips move, and the number of pixels in the lower approximation of the object that contain the lips will also change. The subtraction of this value quantifies the amount of modification in two consecutive frames. Thresholding technique aids us in deleting images where the change in lip movement is not sufficient enough to be considered a new viseme. The frames that are not able to satisfy the threshold (ε) condition will be removed. Other pre-processing steps have been applied to the images such as rotation, conversion of RGB to greyscale, resizing of the pixel’s vector, and normalization as input to the neural network for subsequent processing.
4 Results and Analysis The change detection algorithm was implemented in Python programming language. Figure 4a shows 100 frames that refer to the first word ‘speed’ of the first speaker. Figure 4b shows the reduction in number of frames after applying rough sets with ε = 50. Figure 4c, d show the reduction in number of frames after applying rough
Application of Granular Computing-Based …
147
Fig. 4 a 100 frames for the word ‘speed’, b rough set pre-processing with ε = 50, c rough set pre-processing with ε = 100, d rough set pre-processing with ε = 150 Table 2 VGG architecture layers used in this work Layer Kernel size/stride Input conv-block1 ×2 maxpool1 conv-block2 ×2 maxpool2 conv-block3 ×3 maxpool3 conv-block4 ×3 maxpool4 conv-block5 ×3 maxpool5 flatten fc-relu ×2
(3 × 3) stride = 1 (2 × 2) stride = 2 (3 × 3) stride = 1 (2 × 2) stride = 2 (3 × 3) stride = 1 (2 × 2) stride = 2 (3 × 3) stride = 1 (2 × 2) stride = 2 (3 × 3) stride = 1 (2 × 2) stride = 2
Output size (224 × 224 × 3) (224 × 224 × 64) (112 × 112 × 64) (112 × 112 × 128) (56 × 56 × 128) (56 × 56 × 256) (28 × 28 × 256) (28 × 28 × 512) (14 × 14 × 512) (14 × 14 × 512) (7 × 7 × 512) (250,88) (4096)
sets with ε = 100 and ε = 150. It is noteworthy that the numbers of frames decreased to 43, 17, 8, respectively, depending on the value of ε. For the first video, out of 4900 frames, with the ε = 100, more than 4000 frames were removed without loss of any data thereby reducing computational time and storage. Next, a convolutional neural network model (shown in Table 2) was built to extract the features from raw mouth frames and to convert them to low-dimensional features. In [19], it has been shown that replacing raw image data with features extracted by a pre-trained convolutional neural network (CNN) leads to a better clustering performance. Since we have adopted an unsupervised learning approach, we use CNN to first extract features suggested in [20] and then apply the k-means learning algorithm. Figure 5 gives accuracy values for the CNN model using an unsupervised approach. Table. 3 presents results of extracting features using CNN algorithm. These features were then used as input to the K-means algorithm with the aim of clustering visemes and further labelling. The visemes are clustered to 12 common categories based on [9] and as shown in Table 1. The unsupervised clustering is evaluated using sillhouette coefficient.
148
N. Ashrafi and S. Ramanna
Fig. 5 Feature extraction accuracy a CNN accuracy for ε = 50, b CNN accuracy for ε = 100, c CNN accuracy for ε = 150, d CNN accuracy for without rough set-based pre-processing Table 3 Results of pre-processing effects on CNN for feature extraction ε Accuracy Accuracy Loss (validation) ε = 50 ε = 100 ε = 150 Without pre-processing
0.9403 0.9173 0.8803 0.9572
0.9401 0.9138 0.8796 0.9540
0.0015 0.0024 0.0045 0.0008
Loss (validation) 0.0015 0.0027 0.0044 0.0011
Table 4 Results of pre-processing effects on K-means for labelling ε Silhouette score ε = 50 ε = 100 ε = 150 Without pre-processing
0.1367294 0.23231289 0.46101844 0.11830797
s(i) =
b(i) − a(i) max(b(i), a(i))
(1)
where s(i) is the silhouette coefficient of the ith point, a(i) is the average distance of point i from all other points in its home cluster, b(i) is the average distance of point i from all other points in the nearest cluster from its home cluster. Silhouette coefficient values lies in the range [−1, 1]. A value closer to one implies that the data points are clustered in their proper class. One can observe for this dataset that the silhouette score improves by a considerable extent with preprocessing (Table 4). However, it is interesting to note that the accuracy of the learning features decreases slightly when using pre-processed images. The reason is mainly because of the reduction in the number of training samples. As the threshold ε value increases, there is a commensurate decrease in the size of the training samples, and consequently CNN accuracy decreases. Even though the CNN is learning features not as good as without pre-processing, the K-means precision is improving since Kmeans algorithm is sensitive to outliers and the rough set pre-processing is removing
Application of Granular Computing-Based …
149
Fig. 6 a Class1, b Class2, c Class3, d Class4, e Class5, f Class6, g Class7, h Class8, f Class9, g Class10, h Class11, h Class12
the outliers; therefore, the result of clustering has enhanced overall. Figure 6 is a sample result of labelling visemes for ε = 100 with the average precision of 80% for 12 classes.
5 Conclusion In this paper, we present preliminary results of applying rough sets in pre-processing video frames (with lip markers) of spoken corpus in an effort to label the phonemes spoken by the speakers. The problem addressed here is to detect and remove frames in which the shape of the lips do not represent a phoneme completely. In other words, these frames would be considered as outliers in a k-means clustering method. We also demonstrate experimentally for this sample dataset that the silhouette score improves with pre-processing using the unsupervised K-means clustering method. Additionally, we have also used an unsupervised CNN model for feature extraction which is used as input to the K-means clustering method. The results show promise with the application of a granular computing method for pre-processing large audiovideo datasets. Future work includes experimenting with the complete corpus to label the visemes, improving the k-means clustering and performing comparative work with other clustering and pre-processing methods. Acknowledgements Negin Ashrafi’s research was supported by MITACS RTA grant# IT20946 and Sheela Ramanna’s research was supported by NSERC Discovery grant# 194376.
References 1. Moeslund, T.B., Hilton, A., Krüger, V.: A survey of advances in vision-based human motion capture and analysis. Comput. Vis. Image Understand. 104(2–3), 90–126 (2006) 2. Kawaler, M., Czy˙zewski, A.: Database of speech and facial expressions recorded with optimized face motion capture settings. J. Intell. Inf. Syst. 53(2), 381–404 (2019)
150
N. Ashrafi and S. Ramanna
3. Noda, K., Yamaguchi, Y., Nakadai, K., Okuno, H.G., Ogata, T.: Audio-visual speech recognition using deep learning. Appl. Intell. 42(4), 722–737 (2015) 4. Czyzewski, A., Kostek, B., Bratoszewski, P., Kotus, J., Szykulski, M.: An audio-visual corpus for multimodal automatic speech recognition. J. Intell. Inf. Syst. 49(2), 167–192 (2017) 5. Shillingford, B., Assael, Y., Hoffman, M.W., Paine, T., Hughes, C., Prabhu, U., Liao, H., Sak, H., Rao, K., Bennett, L., et al.: Large-scale visual speech recognition. arXiv preprint arXiv:1807.05162 (2018) 6. Kahou, S., Bouthillier, X., Lamblin, P., Gulcehre, C., Michalski, V., Konda, V.: Emonets: multimodal deep learning approaches for emotion recognition in video. J. Multimodal User Interfaces 10(2), 99–111 (2016) 7. Vryzas, N., Liatsou, A., Kotsakis, R., Dimoulas, C., Kalliris, G.: Speech emotion recognition for performance interaction. J. Audio Eng. Soc. 66(6), 457–467 (2018) 8. Zhang, S., Zhang, S., Huang, T., Gao, W.: Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching. IEEE Trans. Multimedia 20(6), 1576–1590 (2018) 9. Jachimski, D., Czyzewski, A., Ciszewski, T.: A comparative study of english viseme recognition methods and algorithms. Multimedia Tools Appl. 77(13), 16495–16532 (2018) 10. Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001) 11. Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982) 12. Zadeh, L.A.: Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst. 90(2), 111–127 (1997) 13. Butenkov, S.: Granular computing in image processing and understanding. In: Proceedings of International Conference on Artificial Intelligence AIA-2004, Innsbruk (pp. 811–816). IASTED (2004) 14. Pal, S.K., Shankar, B.U., Mitra, P.: Granular computing, rough entropy and object extraction. Pattern Recognit. Lett. 26(16), 2509–2517 (2005) 15. Pal, S.K., Peters, J.F.: Rough Fuzzy Image Analysis: Foundations and Methodologies. CRC Press (2010). ISBN 9781138116238 16. Chakraborty, D., Shankar, B.U., Pal, S.K.: Granulation, rough entropy and spatiotemporal moving object detection. Appl. Soft Comput. 13(9), 4001–4009 (2013) 17. Adak, C.: Rough clustering based unsupervised image change detection. arXiv preprint arXiv:1404.6071 (2014) 18. Pal, S.K., Bhoumik, D., Chakraborty, D.: Granulated deep learning and Z-numbers in motion detection and object recognition. Neural Comput, Appl (2019) 19. Guérin, J., Boots, B.: Improving image clustering with multiple pretrained cnn feature extractors. arXiv preprint arXiv:1807.07760 (2018) 20. Guérin, J., Gibaru, O., Thiery, S., Nyiri, E.: Cnn features are also great at unsupervised classification. arXiv preprint arXiv:1707.01700 (2017)
Application of Implicit Grid-Characteristic Methods for Modeling Wave Processes in Linear Elastic Media Evgeniy Pesnya , Anton A. Kozhemyachenko , and Alena V. Favorskaya Abstract The aim of the study is the development of the implicit grid-characteristic method on structured grids in problems related to linearly elastic medium. We consider how grid-characteristic method builds and adapts boundary conditions to implicit method. Obtained method is tested in one- and two-dimensional model problems. We investigate the grid convergence and stability of the derived algorithm for Courant numbers greater than 1 under the conditions of a one-dimensional problem. As a result of numerical simulation in two dimensions, patterns of the distribution of the vertical component velocity are obtained for different types of calculation methods: implicit and explicit–implicit. The obtained patterns are compared with the explicit grid-characteristic calculation method from the RECT package. According to the obtained results, implicit grid-characteristic method is applicable to the longterm dynamic and static problem, but first-order implicit methods have a solution blur. Keywords Numerical methods · Linearly elastic medium · Implicit scheme · Elastic waves · Computational experiments · Wave phenomena investigation · Grid-characteristic method · Structured meshes
1 Introduction The grid-characteristic method is used in seismic problems [1, 2], destruction of objects under dynamic loads [3]. In recent years, attempts have been made to use it to solve railway safety problems [4, 5]. Other scientists solve the class of problems of numerical simulation related to the calculation of wave processes in railway rails by different calculation methods: the finite element method [6], the discontinuous E. Pesnya · A. A. Kozhemyachenko (B) Moscow Institute of Physics and Technology (MIPT), 9, Institutskii per., Dolgoprudny, Moscow Region 141700, Russian Federation A. A. Kozhemyachenko · A. V. Favorskaya Scientific Research Institute for System Analysis of the RAS, 36(1) Nahimovskij pr., Moscow 117218, Russian Federation © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_12
151
152
E. Pesnya et al.
Galerkin method [7], finite-difference methods and their modifications [8], semianalytic finite element method [9], and commercial closed-source software ANSYS [10]. In [5], the railway track is presented as a model of the layered linear elastic medium. The characteristic longitudinal dimension of the model elements is much larger than the transverse dimension. This fact significantly affects the selection of calculation parameters, since when using explicit grid-characteristic methods, it is necessary to take into account the Courant stability condition, and the problem itself becomes long term. Thus, it becomes necessary to develop a new algorithm for calculating wave phenomena in railway safety problems that can take into account the above features of the original setting. In this paper, we consider the derivation and application of an implicit gridcharacteristic method of the first order of accuracy. In the example of the Lamb problem [11], a new method for calculating wave phenomena in a linear elastic medium using methods of the first order of accuracy is presented: an explicit gridcharacteristic method in the longitudinal direction and the proposed implicit method. The rest of the chapter is organized as follows. Section 2 discusses the main features of the applied grid-characteristic implicit method and the boundary conditions in the area of integration of the problem. Section 3 contains the results of numerical simulation for the one- and two-dimensional problem. Section 4 presents the conclusions on the numerical simulation using the grid-characteristic implicit method.
2 Methods Hereinafter, Sect. 2.1 explains the features of applying the grid-characteristic method to a system describing a linear elastic medium, using implicit scheme of the first order of accuracy. Section 2.2 formulates the initial and boundary conditions for the specified integration area.
2.1 Grid-Characteristic Method The components of the propagation velocity of the disturbance V and symmetric Cauchy tensor σ in the linearly elastic medium are described by the following system of equations: ρVt = (∇ · σ)T ,
(1)
σt = λ(∇ · V)I + μ(∇ ⊗ V + (∇ ⊗ V)T ),
(2)
Application of Implicit Grid-Characteristic Methods …
153
where λ, μ are the Lame parameters that determine the properties of an elastic material, ρ is the density of the material, and I is the unit tensor of the second rank. Equation 1 is a local equation of motion. We derived Eq. 2 from the law of Hook. We applied the grid-characteristic method for the numerical solution of systems of Eqs. 1–2. Therefore, the method allowed to construct correct numerical algorithms to calculate the boundary points and points of contacting media with different Lame and/or density parameters. The vector of unknowns in the case of an isotropic linearly elastic medium in two-dimensional case is provided by Eq. 3. q = (V, σ)T = (V1 , V2 , σ11 , σ22 , σ12 )T
(3)
System of Eqs. 1–2 in two-dimensional case is represented in the form of Eqs. 4–6. qt + A1 qx + A2 q y = 0 ⎞ 0 0 −1/ρ 0 0 ⎜ 0 0 0 0 −1/ρ ⎟ ⎜ ⎟ ⎜ ⎟ A1 = ⎜ −λ − 2μ 0 0 0 0 ⎟ ⎜ ⎟ ⎝ −λ 0 0 0 0 ⎠ 0 −μ 0 0 0 ⎛ ⎞ 0 0 0 0 −1/ρ ⎜ 0 0 0 −1/ρ 0 ⎟ ⎜ ⎟ ⎜ ⎟ A2 = ⎜ 0 −λ 0 0 0 ⎟ ⎜ ⎟ ⎝ 0 −λ − 2μ 0 0 0 ⎠ −μ 0 0 0 0
(4)
⎛
(5)
(6)
Cleavage is performed in two directions (x and y), and we obtain the further expression: q1t + A1 q1x = 0,
q2t + A2 q2y = 0.
(7)
Matrixes A1 and A2 are the hyperbolic, and they have a set of eigenvectors: −1 A1 = 1 1 1 ,
(8)
−1 A2 = 2 2 2 .
(9)
where 1 and 2 are composed of eigenvectors, and the eigenvalues of A1 and A2 are the elements of the diagonal matrix :
154
E. Pesnya et al.
⎞ 0 0 0 0 0 0⎟ ⎟ ⎟ cs 0 0 ⎟, ⎟ 0 −c p 0 ⎠ 0 0 cp ⎛ ⎞ −c p ⎞ 0 0 c−λ 2ρ 1 0 p λ ⎜ ⎟ 0 ⎟ ⎜ 0 cs2ρ 0 0 21 ⎟ ⎟ ⎜ ⎟ 2 −1 −cs ρ cpρ ⎟ 0 0 21 ⎟ =⎜ ⎟, 1 ⎜ 0λ 2 ⎟, λ ⎟ λ ⎜ ⎟ 0 0 0 ⎠ 2 1 2c p ρ ⎝ 2c p ⎠ −λ λ 1 0 0 0 2 2c p 2c p ρ ⎞ ⎛ ⎞ 0 0 0 1 c−λ 2ρ 0 p −1 ⎟ ⎜ cs ρ ⎟ ⎟ ⎜ 2 0 0 0 21 ⎟ cpρ ⎟ ⎜ −cs ρ ⎟ 2 −1 λ ⎟, 1 =⎜ ⎟. c2p ρ ⎟ ⎜ 2 c0p ρ 0 01 2 ⎟ ⎟ ⎝ 0 0 2 0⎠ 1 ⎠ 2 −c p ρ 0 21 0 0 0 2 ⎛
0 ⎜0 ⎜ ⎜ = 1 = 2 = ⎜ 0 ⎜ ⎝0 0 ⎛
0 ⎜0 ⎜ ⎜ 1 = ⎜ 0 ⎜ ⎝1 0 ⎛ 0 ⎜ ⎜0 ⎜ 2 = ⎜ ⎜1 ⎜ ⎝0 0
0
0
1 −1 cs ρ cs ρ
0 0 1
0 0 1
0
0
0
0 1
0 1
0 c2p ρ λ
1 0
1 −1 cs ρ cs ρ
0
cp λ
0 1 cpρ λ c2p ρ
1 0
0 −cs 0 0 0
(10)
(11)
(12)
Thus, we revised Eq. 7 in the form of Eq. 13. −1 −1 qt + 1 1 1 qx = 0 qt + 2 2 2 q y = 0
(13)
Further calculations are divided into three stages. At the first stage, the multiplication by the matrix (1 )−1 and the transition to new variables are performed (Eq. 14). At the second stage, one-dimensional transfer equations are solved using the method of characteristics or conventional finite-difference schemes. At the third stage, the inverse change is made (Eq. 15). The obtained solution is used as the initial distribution for the step in the second spatial direction (Eq. 16), after which the three stages are repeated and the final solution for the next time step is obtained (Eq. 17): −1 ω1 (x, y, t) = 1 q(x, y, t),
(14)
q(x, y, t + τ/2) = 1 ω1 (x, y, t + τ ),
(15)
−1 ω2 (x, y, t) = 2 q(x, y, t + τ/2),
(16)
q(x, y, t + τ ) = 2 ω2 (x, y, t + τ ).
(17)
We calculated a wavefront propagation with the implicit and explicit schemes of the first order of accuracy in the multilayer medium. Implicit scheme is described by Eq. 18.
Application of Implicit Grid-Characteristic Methods … n+1 ωn+1 − ωm−1 ωmn+1 − ωmn +c m = 0 n = 1, N m = 2, M − 1 τ h ωn+1 − ωmn+1 ωmn+1 − ωmn + c m+1 = 0 n = 1, N m = M − 1, 2 c0
(18)
Explicit scheme is described by Eq. 19. n ωn − ωm−1 ωmn+1 − ωmn +c m = 0 n = 1, N m = 2, M − 1 τ h ωn − ωmn ωmn+1 − ωmn + c m+1 = 0 n = 1, N m = M − 1, 2 c0
(19)
Here, N is the number of steps in time, and M is the number of nodes in the corresponding spatial direction. Explicit scheme is stable for Courant numbers (Eq. 20) not more than one: C=
c p,s τ , h
(20)
where τ is the time step, h is the space step, c is the eigenvalue of the matrix 1 corresponding to the transfer equation, cp is longitudinal wave velocity, and cs is shear wave velocity.
2.2 Border Conditions The free boundary condition is characterized by the equation σ·n = 0, where n is the external normal to the surface of the layer under consideration. We considered this condition at the boundary perpendicular to the direction x:
σ σ · n = ± 11 = 0. σ12
(21)
Then, the corresponding components of the variables, expressed in terms of invariants, are equal to zero: ω = (ω1 , ω2 , ω3 , ω4 , ω5 )T ,
(22)
⎛
⎞ c (ω4 − ω5 ) λp ⎜ (ω − ω ) 1 ⎟ 3 cs ρ ⎟ ⎜ 2 ⎜ ⎟ q(x, y, t + τ/2) = ⎜ (ω4 + ω5 ) c λp ρ ⎟, ⎜ ⎟ ⎝ ω1 + ω4 + ω5 ⎠ ω2 + ω3
(23)
156
E. Pesnya et al.
c2 ρ
(ω4 + ω5 ) λp = 0 . ω2 + ω3 = 0
(24)
Further, we expressed the invariants corresponding to the characteristics that went beyond the integration region in terms of the invariants corresponding to the characteristics inside the region. We obtained the boundary relations for the right (Eq. 25) and left (Eq. 26) boundaries:
ω5 = −ω4 , ω3 = −ω2
(25)
ω4 = −ω5 , ω2 = −ω3
(26)
Then, the vector of variables q(x, y, t + τ /2) on the right and left boundaries took the form: ⎛ −2ω5 c p ⎞ ⎛ 2ω4 c p ⎞ ⎜ − λ2ω3 ⎜ 2ωλ 2 ⎟ ⎜ cs ρ ⎜ cs ρ ⎟ ⎜ ⎟ ⎜ right − ⎜ 0 ⎟, left − ⎜ 0 ⎜ ⎟ ⎜ ⎝ ω1 ⎝ ω1 ⎠ 0 0
⎟ ⎟ ⎟ ⎟. ⎟ ⎠
(27)
Similar equations for invariants are obtained on the boundary when we considered the boundary perpendicular to the y direction. The vector of variables for the right and left boundaries is equal: ⎛ 2ω2 ⎞ cs ρ 2ω4 cpρ
⎛
−2ω3 cs ρ ⎜ − 2ω5 ⎜ cpρ
⎜ ⎟ ⎜ ⎟ ⎜ ⎜ ⎟ right − ⎜ ω1 ⎟, left − ⎜ ω1 ⎜ ⎜ ⎟ ⎝ 0 ⎝ 0 ⎠ 0 0
⎞ ⎟ ⎟ ⎟ ⎟. ⎟ ⎠
(28)
The invariant ω1 remained constant. We considered the boundary conditions for the implicit method (Eq. 18) for the invariants ω2 , ω3 , ω4 , and ω5 . It is necessary to solve the system (Eq. 29); to simplify the notation, the indices of the invariants are omitted:
n+1 ω M = α M ω1n+1 + β M , α1 , β1 , α M , β M ∈ R, (29) ω1n+1 = α1 ωn+1 M + β1 where ω1,M is invariant at the next step at the first and last nodes, respectively. The solution of the system is follows:
Application of Implicit Grid-Characteristic Methods …
q1n+1 =
157
α1 β M + β1 αM βM + βM n+1 , qM = . 1 − α1 α M 1 − α1 α M
(30)
For the implicit scheme of the first order of accuracy, we obtained formulas for the invariants corresponding to positive (Eq. 31) and negative (Eq. 32) characteristics.
n+1 qM =
q1n+1
=
C M−1 qn (1+C) M 1
1 qn 1+C 1
+
+
M−1 i=2
M−1 i=2
C M−1−i (1+C) M+i
1−
C M−1−i (1+C) M−i C 2(M−1) 1+C
C 2(M−1)−i (1+C)2M−1+i
1−
C 1+C
+
qin +
Ci n + (1+C) i+1 qi + 2(M−1)
1 qn 1+C M
C M−1 qn (1+C) M M
(31)
(32)
3 Results Hereinafter, Sects. 3.1 and 3.2 contain the results of numerical simulation in the cases of one-dimensional problem and two-dimensional problem in case of Lamb problem, respectively.
3.1 One-Dimensional Problem We consider the area L = 10 m to study the grid convergence with material parameters ρ = 2712 kg/m3 , E = 70 GPa (Young’s modulus), and μ = 0.34. The initial condition is set by the function ⎧ ⎨ x − 2, 2 ≤ x ≤ 3 u(x) = 4 − x, 3 < x ≤ 4 ⎩ 0, x∈ / [2, 4]
(33)
Non-reflecting boundary conditions are specified at the boundaries. Figure 1 shows the graph of the grid convergence for the one-dimensional problem. The solution by the explicit and implicit method W is compared with the analytical solution of the problem W a . Figure 1b shows a similar comparison for the case of an implicit scheme, but for Courant numbers greater than 1. These results allow us to conclude that the compiled algorithm for the operation of the grid-characteristic method using the implicit scheme at the stage of numerical computation is stable and has grid convergence.
158
E. Pesnya et al.
Fig. 1 Graph of the grid convergence for the one-dimensional problem: a grid convergence for C < 1 and b grid convergence for C > 1
3.2 Two-Dimensional Problem Figure 2 shows the result of numerical simulation of wavefront propagation for the Lamb problem. On the upper boundary, an initial perturbation takes the following form: σ22 = Aδ(x0 − x)δ(t0 − t).
a
(34)
b
c Fig. 2 Distribution of the vertical component of the disturbance velocity at the time 6 × 10–4 s: a implicit grid-characteristic method, b explicit–implicit grid-characteristic method, and c explicit grid-characteristic method
Application of Implicit Grid-Characteristic Methods …
159
The time τ step is equal to 4 × 10–6 s, the space steps hx and hz are equal to 0.05 (square structured grid), and material parameters ρ = 2712 kg/m3 , cp = 6303 m/s, and cs = 3103 m/s. The figure shows the distribution of the vertical component of the disturbance velocity when using the schemes of explicit (Fig. 2a) and also explicit–implicit calculation (Fig. 2b). The obtained results are compared with the distribution obtained using the explicit scheme (Fig. 2c) of the first order of accuracy (CIR schema) from the RECT package, developed at the Laboratory of Applied and Computational Geophysics of Moscow Institute of Physics and Technology, which is based on explicit implementations of the grid-characteristic method. In the case of the explicit–implicit implementation, calculations are performed for the x direction according to the explicit scheme and the implicit scheme for the y direction. A similar calculation method can be useful in long-term problems with different characteristic sizes of the integration regions. Implicit first-order methods have the property of blurring the solution, and this phenomenon can be observed in the corresponding figures (Fig. 2a, b).
4 Conclusions We have developed an implicit calculation algorithm using the grid-characteristic method for linear elastic media and have adapted it for the case of specifying nonreflecting boundary conditions and free boundary conditions. The obtained method is stable, has grid convergence, and allows to conduct calculations for Courant numbers greater than 1. Comparison of our implicit and explicit–implicit implementation with the explicit scheme from RECT package shows the presence of solution blur, which is associated with specific properties of implicit first-order schemes. However, we assume that it can be avoided by using higher-order implicit schemas. So, to increase the order of accuracy of the considered scheme of implicit corners, it is enough to add only one node to the explicit time layer to obtain a scheme of the second order of accuracy. Continuing research into the behavior of implicit schemes will make it possible to more efficiently apply the grid-characteristic method for solving various long-term dynamic and static problems. Acknowledgements This work has been performed with the financial support of the Russian Science Foundation (project No. 20-71-10028).
References 1. Favorskaya, A., Petrov, I., Khokhlov, N.: Numerical modeling of wave processes during shelf seismic exploration. Procedia Comput. Sci. 96, 920–929 (2016) 2. Favorskaya, A., Petrov, I.: The use of full-wave numerical simulation for the investigation of fractured zones. Math. Models Comput. Simul. 11, 518–530 (2019)
160
E. Pesnya et al.
3. Breus, A., Favorskaya, A., Golubev, V., Kozhemyachenko, A., Petrov, I.: Investigation of seismic stability of high-rising buildings using grid-characteristic method. Procedia Comput. Sci. 154, 305–310 (2019) 4. Favorskaya, A., Khokhlov, N.: Modeling the impact of wheelsets with flat spots on a railway track. Procedia Comput. Sci. 126, 1100–1109 (2018) 5. Kozhemyachenko, A.A., Petrov, I.B., Favorskaya, A.V., Khokhlov, N.I.: Boundary conditions for modeling the impact of wheels on railway track. J. Comput. Math. Math. Phys. 60, 1539– 1554 (2020) 6. Nejad, R.: Using three-dimensional finite element analysis for simulation of residual stresses in railway wheels. Eng. Fail. Anal. 45, 449–455 (2014) 7. Petrov, I., Favorskaya, A., Khokhlov, N., Miryakha, V., Sannikov, A., Golubev, V.: Monitoring the state of the moving train by use of high performance systems and modern computation methods. Math. Models Comput. Simul. 7, 51–61 (2015) 8. Feng, D., Feng, M.: Model updating of railway bridge using in situ dynamic displacement measurement under trainloads. J. Bridge Eng. 20(12), 04015019 (2015) 9. Bartoli, I., Marzani, A., di Scalea, F., Viola, E.: Modeling wave propagation in damped waveguides of arbitrary cross-section. J. Sound Vib. 295(3–5), 685–707 (2006) 10. Zumpano, G., Meo, M.: A new damage detection technique based on wave propagation for rails. Int. J. Solids Struct. 43(5), 1023–1046 (2006) 11. Aleksandrova, N.I.: The discrete Lamb problem: elastic lattice waves in a block medium. Wave Motion 51(5), 818–832 (2014)
Combined Approach to Modeling of Acceleration Response Spectra in Areas of Active Shallow Seismicity Vasiliy Mironov , Konstantin Simonov , Aleksandr Zotin , and Mikhail Kurako
Abstract The paper is a continuation of the research of the effect of combining seismological and engineering-seismological methods based on real geological and geophysical data. A probabilistic seismic hazard analysis and numerical modeling of acceleration response spectra for soil were performed for the site. Using the Monte Carlo method, 100 realizations of each of the 106 soil profile models of the site were prepared to take into account the uncertainty and scatter in shear wave velocities of the geological layers. Comparing and analyzing the results obtained, it was found that the generalized acceleration response spectrum relative to the surface, taking into account the parameters of a 30 m thickness, provides for most of the considered spectral periods higher estimates of spectral accelerations than numerical modeling and can be used to preliminary estimate the acceleration amplitudes of the response spectrum for soil profile before performing seismic microzoning. Keywords PSHA · Earthquake · Numerical simulation · Response spectrum · Accelerogram
V. Mironov · K. Simonov Institute of Computational Modeling SB RAS, 50/44 Akademgorodok, Krasnoyarsk 660036, Russian Federation e-mail: [email protected] K. Simonov e-mail: [email protected] V. Mironov Krasnoyarsk Branch of the Federal Research Center for Information and Computational Technologies, 53 Mira Pr, Krasnoyarsk 660049, Russian Federation A. Zotin (B) Reshetnev Siberian State University of Science and Technology, 31 Krasnoyarsky Rabochy Pr, Krasnoyarsk 660037, Russian Federation e-mail: [email protected] M. Kurako Siberian Federal University, 79 Svobodny Pr, Krasnoyarsk 660041, Russian Federation e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_13
161
162
V. Mironov et al.
1 Introduction Seismic hazard assessment of the construction site is an integral part of the complex of geotechnical surveys in the design of critical objects. In most countries of the world, for example, in Europe, the USA, as well as in the Russian Federation, standard assessments of seismic hazard are probabilistic in nature. Based on the earthquake catalogs and active faults databases, earthquake source zone models (ESZ models) and attenuation models (also referred to as the ground motion prediction equations— GMPEs) are compiled. These models are input data for the probabilistic seismic hazard analysis (PSHA), based on the results of which a set of seismic zoning maps is compiled. In the design regulations of the Russian Federation seismic hazard, as a rule, consists of an assessment of the initial seismicity using general seismic zoning maps or using detailed seismic zoning and consists of corrections for real (local) soil conditions of the site (seismic microzoning—SMZ) [1]. The normative set of seismic zoning maps consists of three maps A, B and C, reflecting the intensities of maximum calculated seismic impacts, which will not be exceeded within 50 years with a probability of 90, 95 and 99%. This corresponds to the recurrence periods of seismic events 500, 1000 and 5000 years [1]. Numerical simulation of acceleration response spectrum for soil is one of the computational methods for seismic microzoning. Often, during the time allotted for the SMZ, it is not possible to record the maximum possible seismic events (previously observed by regional networks of seismic stations). Therefore, the methods for modeling of acceleration response spectra for soil to seismic effects are aimed at calculating spectral characteristics and accelerograms on a ground surface or at internal points of a layered soil profile from given input ground motions (earthquakes). Knowledge of the spectral composition of ground vibrations of sites is necessary in the design of buildings and structures in order to avoid future casualties and destruction in the event of strong earthquakes [2] (i.e., to ensure seismostability of objects). The work is devoted to the study of the effect of combining seismological and engineering-seismological methods. The main task is to determine and analyze the behavior (relative to each other) of the acceleration response spectra to the ground surface obtained by different methods. Numerical simulation and the probabilistic seismic hazard analysis were carried out for real geological and geophysical data. The study site is the territory for the construction of a critical object, located within the Novosibirsk agglomeration of the Russian Federation. The OpenQuake Engine software package is used to perform PSHA [3]. Modeling of acceleration response spectra for soil to earthquakes was carried out in the DEEPSOIL program [4]. The paper is organized as follows. Section 2 presents description of source data and processing methods. Section 3 describes the calculation of acceleration response spectra. Section 4 provides the results of experimental studies. Concluding remarks are outlined in Sect. 5.
Combined Approach to Modeling of Acceleration Response …
163
2 Source Data and Processing Methods The analysis of the seismic effects of the investigated site and the execution of PSHA is based on the following data. The first type of data is a specialized catalog of earthquakes in Northern Eurasia without aftershocks, while the second type of data is studies of the laboratory of neotectonics and modern geodynamics of the Geological Institute of the Russian Academy of Sciences. To describe the soil conditions of the site and conduct numerical modeling, the materials of engineering-geological and geophysical surveys were used. In the geological profiles of the site, 12 soil types were identified according to the composition, condition of the soils, taking into account their origin, textural and structural features, as well as types varieties. Based on the results of laboratory and field work, the physical and mechanical properties of each soil type were determined. To determine the shear wave velocities (Vs) of the site soils, a set of field seismic surveys was carried out, using the seismic refraction method, the multichannel analysis of surface waves method (MASW) and spatial autocorrelation method (SPAC). In order to obtain reliable velocity characteristics of soil types, the results of seismological surveys were compared with the geological profiles of the site (Fig. 1). The hazard assessment by the PSHA method consists in determining the level of seismic effects at the site, which will not be exceeded with a fixed probability for a given period of time [5]. According to classic approach in order to calculate the probability of exceeding the specified ground motion amplitude at the investigated site, the contributions of hazard are integrated over all magnitudes and distances for all ESZs according to the total probability theorem [6]. In general, the classical approach can be divided into the following steps: identification and parameterization of seismic sources, setting the temporal and magnitude distribution of earthquakes within the ESZs, preparation and selection of GMPEs, implementation and accounting of aleatoric and epistemic uncertainties [7]. The purpose of the ESZ models is to adequately describe the distribution of earthquakes in space and time. ESZs are represented as domains, faults and point
Fig. 1 Comparison of seismological survey results and geological profiles
164
V. Mironov et al.
sources. Domains are model areal sources of earthquakes, within which seismicity is uniformly distributed to specified depths. Faults are model line sources falling at certain angles, where the ruptures are distributed in the plane of the fault. The attenuation model (or GMPE) describes the dependence of the level of ground vibrations at the investigated site on the magnitude of the earthquake, the distance to the source, the local conditions of the site, the focal mechanism, etc. [7]. Hazard curves are set in the coordinates of the ground motion parameter and the probability of its exceeding in a given time interval both for the peak ground acceleration (PGA) and for the given spectral periods. To account for epistemic and aleatoric uncertainties in the computation process, a logic tree structure was developed. The first branch set of the logic tree consisted of several alternative ESZ models with different parameters of the temporal and magnitude distribution of earthquakes. The second branch set of the logic tree consisted of following alternative foreign GMPEs: Abrahamson et al. [8, 9], Campbell and Bozorgnia [10], Chiou and Youngs [11]. The calculation of PSHA was performed in OpenQuake Engine software package. It was developed during the implementation of the scientific program Seismic Hazard Harmonization in Europe (SHARE) [12]. The probabilistic seismic hazard analysis was carried out on the basis of the classic approach. For the numerical modeling of acceleration response spectra for soil, an equivalent linear analysis was used, which makes it possible to take into account the nonlinear properties of soils when the time series of accelerations (accelerograms) passes through the soil profile model from the elastic half-space. During equivalent linear modeling, the soil is considered as a linear viscoelastic medium lying on a viscoelastic half-space. It is assumed that the boundaries of the soil layers are located horizontally, and seismic body waves (longitudinal and shear) propagate exactly vertically. Nonlinear properties are taken into account by relationships describing the variation of material shear modulus (G) and hysteretic damping ratio (γ) depending on the shear strain [13]. These relationships are called modulus reduction and damping curves. Such relationships for each soil type are determined, as a rule, in laboratory conditions, taking into account the lithological composition and depth of occurrence. If such surveys are not provided for, then use the relationships for soil analogs taken from literature sources. In addition to the modulus reduction and damping curves, each layer of the soil profile is characterized by known values of thickness, unit weight, shear wave velocity and material damping ratio [4]. The listed data are parameters for constructing the soil profile models. An iterative equivalent linear analysis algorithm consists of the following steps: 1. 2. 3.
Initialization of initial shear modulus and damping ratio at small values of shear strain in each layer. Calculation of the maximum amplitude of shear strain in each layer. Calculation of the effective strain by multiplying the maximum value by a factor depending on the magnitude of the earthquake.
Combined Approach to Modeling of Acceleration Response …
4. 5.
165
Calculation of new equivalent values of shear modulus and damping ratio corresponding to effective strain. Repeat steps 2–4 until, in two successive iterations, the shear modulus and damping ratio values are less than a given value in all layers.
The result of numerical modeling by the method of equivalent linear analysis, in this work, is acceleration response spectra to the ground surface from the given input motions. Due to the fact that obtaining the modulus reduction and damping curves was not foreseen in the process of geotechnical surveys, the relationships for similar soils published in the scientific literature were used for numerical modeling. For sand, curves Seed & Idriss [14] were used, for clay were used Sun et al. [15] (depending on the plasticity index), for gravel were used Seed et al. [16], for bedrock were used EPRI 1993 [17]. Table 1 gives a description of soil types and their characteristics for Table 1 Description of soil types and their characteristics used for numerical modeling Soil type
Unit weight, kN/m3
Plasticity index (PI)
1
17.9
6.4
2
18.9
3
Mean Vs m/s
Source of curves
Description
186
Sun et al. [15]
Silty clay sand, stiff
9.3
190
Sun et al. [15]
Silty clay loam, stiff
19.1
10.7
249
Sun et al. [15]
Silty clay loam, plastic
4
20.1
6.1
328
Sun et al. [15]
Soft sandy loam
5
19.9
463
Seed & Idriss [14]
Silty sand, hard
6
20.3
610
Seed & Idriss [14]
Fine sand, hard
7
21.1
9.6
1185
Sun et al. [15]
Sedentary clay loam
8
22.5
6.1
1381
Seed et al. [16] Sedentary gravel
9
25.4
1976
EPRI 1993 [17]
Medium weathered shale
10
26.4
2200
EPRI 1993 [17]
Lightly weathered shale
11
24.7
2430
EPRI 1993 [17]
Medium weathered sandstone
12
25.9
2546
EPRI 1993 [17]
Lightly weathered sandstone
166
V. Mironov et al.
constructing soil profile models and carrying out numerical modeling of acceleration response spectra for soil. Numerical modeling was carried out in the DEEPSOIL program. It is a onedimensional site response analysis program that can perform 1-D nonlinear time domain analysis with and without pore water pressure generation, 1-D equivalent linear frequency domain analysis, and 1-D linear time and frequency domain analysis [4].
3 Acceleration Response Spectra Calculation for Site For calculating PSHA, an important parameter that significantly affects the result obtained is the average shear wave velocity in a thirty-meter soil layer (Vs30). This parameter is taken into account at the step of calculating the GMPE and is responsible for the local soil conditions of the site. Two calculations were carried out, in the first the value of Vs30 = 1976 m/s. This means that the resulting PSHA data will be applicable to the bedrock level of the site soil. The selected value of Vs30 corresponds to the minimum average shear wave velocity for the rock soils of the site; in addition, according to the used GMPE, higher values will not affect the final result. For the second calculation, the Vs30 value is defined as 310 m/s (average of all soil profile models of the site). Thus, the data obtained during the second calculation will be applicable to the ground surface of the study site. As a result of performing PSHA, generalized acceleration response spectra were obtained for the level of bedrock and for the level of the ground surface for the recurrence periods of seismic effects of 500, 1000 and 5000 years (Fig. 2). Based on the obtained generalized acceleration response spectra for the bedrock, six three-component accelerograms were synthesized for each considered recurrence period. For synthesizing, we used the method of generating a calculated accelerogram recommended by Appendix 3 of the Russian normative document RB-06-98. The horizontal components of the obtained time series of accelerations are further used in the numerical modeling of soil response as input motions from the elastic half-space. As a basis for numerical modeling, 106 soil profile models were used, compiled from data on geological profiles and materials of seismological surveys. The soil
Fig. 2 Generalized acceleration response spectra for the level of bedrock and for the level of the ground surface for the recurrence periods of 500, 1000 and 5000 years
Combined Approach to Modeling of Acceleration Response …
167
profile model is described as a set of layers listed and numbered from top to bottom including a half-space, each of which has its own physical and mechanical parameters. In order to take into account the uncertainties in the soil profile models compilation and the scatter in the shear wave velocities of geological layers, 100 realizations were prepared for each model. With the use of Monte Carlo method, for each soil type was assigned a random value of Vs relative to the mean (Table 1) with a coefficient of variation of 0.2. Additionally, each layer was divided in thickness into sublayers so that the maximum layer frequency (f max ) was at least 30 Hz (as indicated in the user manual) [4]. Soil type 9 is taken as an elastic half-space. To take into account the uncertainties of the input motions, six synthesized accelerograms generated from the PSHA data were used in the calculations for each soil profile models for each the recurrence period. Figure 3 shows the acceleration response spectra obtained for each of the 106 soil profile models averaged over all realizations and over all input motions (thin lines), the average acceleration response spectrum over the entire site (thick black line) and the generalized acceleration response spectrum on the ground surface based on PSHA results (thick blue line) for a recurrence period of 500 years. Figure 4 shows the percent deviation of the values of the spectral acceleration (SA) amplitudes of the average response spectrum relative to the values of the generalized spectrum at the points corresponding to the studied spectral periods. Blue lines represent deviations
Fig. 3 Acceleration response spectra plots for T = 500 years
Fig. 4 Acceleration response spectra deviation plots for T = 500 years
168
V. Mironov et al.
to the smaller side, red lines to a larger one and the black line represents the envelope of deviations. In the graphs, the leftmost value corresponds to the PGA level. Figures 3 and 4 show that the spectral acceleration values of the generalized response spectrum to the ground surface, obtained from the PSHA results, for most spectral periods are higher than the similar values of the averaged response spectrum obtained by numerical modeling. Similar behavior is observed for PGA values. For the recurrence periods of 1000 and 5000 years, similar graphs were obtained with a small shift of positive and negative intervals of deviations along the axis of spectral periods and slightly larger values of spectral accelerations.
4 Results of Experimental Studies Comparing and analyzing the data obtained from the numerical modeling of acceleration response spectra for soil and the probabilistic analysis of seismic hazard, it can be concluded that the generalized response spectrum relative to the surface, taking into account the parameters of a 30 m thickness, provides for most of the considered spectral periods predominantly higher estimates of spectral accelerations than numerical modeling. This behavior is observed at spectral periods of 0.01–0.03 s, where the maximum deviations from the generalized response spectrum are 14, 11 and 12% for a recurrence periods of 500, 1000 and 5000 years, respectively. At intervals of 0.10–0.12 s (500 years), 0.11–0.12 s (1000 years), 0.11–0.13 s (5000 years), the maximum deviations are 10, 6 and 8%. In the range of spectral periods of 0.16/0.17–0.37 s for all recurrence periods, an increase in the deviation of the acceleration amplitudes is observed at first, reaching values of 45, 45, 47%, and then a decrease to 4, 4 and 10%. In the range of 0.38/0.39–5.0 s, there is a similar increase in the deviation occurs, followed by a gradual decrease, but already with a large deviation value. The opposite behavior of the deviation is observed in the intervals of spectral periods of 0.04–0.09 s, 0.13–0.15 s, where the maximum deviations from the generalized response spectrum are 25% and 8%, respectively, for a recurrence period of 500 years. For a recurrence period of 1000 years at intervals of 0.04–0.10 s, 0.13– 0.15 s, the maximum deviations are 29 and 10%. For a recurrence period of 5000 years at intervals of 0.04–0.10 s, 0.14–0.16 s, the maximum deviations are 30 and 7%. Figure 5 shows the combined plot of envelopes of deviations for intervals spectral periods of 0.01–0.5 s. Using data of Fig. 5 and the Building Code of Russia (BC14.13330.2018), Table 2 was compiled. It presents the maximum and minimum deviation values for the spectral intervals for all considered recurrence periods. For PGA, the generalized response spectrum relative to the ground surface gives predominantly higher estimates than numerical modeling. This is typical for all recurrence periods both for the averaged response spectrum over the entire site and for all 106 averaged spectra. For a recurrence period of 500 years, the average PGA value is 0.083 g (gravitational acceleration) with an average standard deviation value (σ ) for all soil profile models of 0.011 g, while according to the PSHA results for the
Combined Approach to Modeling of Acceleration Response …
169
Fig. 5 Combined plot of envelopes of deviations for different recurrence periods
Table 2 Relative deviation values of the acceleration amplitudes of the average response spectra over the entire site to the generalized response spectra for different recurrence periods T = 500 years
T = 1000 years
T = 5000 years
Interval of spectral periods (s)
Deviations (%)
Interval of spectral periods (s)
Deviations (%)
Interval of spectral periods (s)
Deviations (%)
0.01–0.03
−12 to −14
0.01–0.03
−11 to −10
0.01–0.03
−12
0.04–0.09
+5 to +25
0.04–0.10
+5 to +29
0.04–0.10
+3 to +30
0.10–0.12
−2 to −10
0.11–0.12
−4 to −6
0.11–0.13
−3 to −8
0.13–0.15
+3 to +8
0.13–0.15
+4 to +10
0.14–0.16
+1 to +7
0.16–0.35
−6 to −45
0.16–0.35
−4 to −45
0.17–0.36
−9 to −47
0.36–0.37
−4
0.36–0.37
−4
0.37–0.38
−9 to −10
0.38–5.0
−7 to −74
0.38–5.0
−6 to −74
0.39–5.0
−9 to −75
ground surface, PGAPSHA = 0.095 g. The deviation from the generalized response spectrum is 13%. For a recurrence period of 1000 years, the average PGA value is 0.116 g, the average σ = 0.016 g, PGAPSHA = 0.130 g, and the deviation is 11%. For a recurrence period of 5000 years, the average PGA value is 0.208 g, the average σ = 0.028 g, PGAPSHA = 0.236 g, and the deviation is 12%. Figure 6 shows the hazard curves for the PGA level obtained using PSHA and numerical modeling (taking into account one standard deviation).
Fig. 6 Hazard curves for the PGA level
170
V. Mironov et al.
Based on the results of numerical modeling, a possible excess of the PGAPSHA value, taking into account one standard deviation, is observed only for a recurrence period of 1000 years, but rather insignificant. This indicates a high convergence of the results for the PGA values, and also, due to small σ at all recurrence periods, indicates a relatively homogeneous geology in the site. It should be noted that, taking into account the entire set of realizations of the response spectra for the shear wave velocities and for the input motions, the proportion of the excess of the PGAPSHA values are 14, 19, 16% for recurrence periods of 500, 1000, 5000 years, respectively.
5 Conclusions For the study site, a probabilistic seismic hazard analysis was carried out using the OpenQuake Engine software. Numerical modeling of acceleration response spectra for soil by equivalent linear analysis was carried out using the DEEPSOIL program. Generalized response spectra were obtained with respect to the level of bedrock and the level of the ground surface for the seismic effects recurrence periods of 500, 1000 and 5000 years. Taking into account the numerical modeling, the response spectra averaged over all realizations and over all input motions and the resulting average response spectra over the entire site were calculated for recurrence periods of 500, 1000 and 5000 years. After analyzing and comparing obtained data, we can conclude that the generalized response spectra relative to the surface, taking into account the parameters of a 30 m thickness, provides for most of the considered spectral periods predominantly higher estimates of spectral accelerations than numerical modeling. The visible difference in the compared acceleration response spectra is associated, first of all, with the fact that in numerical modeling, the amplitudes increase predominantly at the periods of natural oscillations of the geological profile. In general, it can be concluded that for a preliminary assessment of the amplitudes of the acceleration response spectrum before the SMZ stage, it is possible to use the results of PSHA taking into account the averaged characteristics of a 30-m soil profile for the study site and adjacent territories.
References 1. Zavyalov, A.D., Peretokin, S.A., Danilova, T.I., Medvedeva, N.S., Akatova, K.N.: General seismic zoning: from maps GSZ-97 to GSZ-2016 and new-generation maps in the parameters of physical characteristics. Seismic Instrum. 55(4), 445–463 (2019) 2. USGS spectral response maps and their relationship with seismic design forces in building codes. http://pubs.er.usgs.gov/publication/ofr95596. Last accessed 18 Jan 2021 3. Pagani, M., Monelli, D., Weatherill, G., Danciu, L., Crowley, H., Silva, V., Henshaw, P., Butler, L., Nastasi, M., Panzeri, L., Simionato, M., Vigano, D.: OpenQuake engine: an open hazard (and risk) software for the global earthquake model. Seismol. Res. Lett. 85(3), 692–702 (2014)
Combined Approach to Modeling of Acceleration Response …
171
4. Hashash, Y.M.A., Musgrove, M.I., Harmon, J.A., Ilhan, O., Xing, G., Groholski, D.R., Phillips, C.A., Park, D.: «DEEPSOIL 7.0, User Manual». Board of Trustees of University of Illinois at Urbana-Champaign, Urbana, IL (2020) 5. Gupta, I.D.: Probabilistic seismic hazard analyses method for mapping of spectral amplitudes and other design-specific quantities to estimate the earthquake effects on manmade structures. ISET J. Earthq. Technol. 44(1), 127–167 (2007) 6. Atkinson, G.M., Bommer, J.J., Abrahamson, N.A.: Alternative approaches to modeling epistemic uncertainty in ground motions in probabilistic seismic-hazard analysis. Seismol. Res. Lett. 85(6), 1141–1144 (2014) 7. Mironov, V.A., Peretokin, S.A., Simonov, K.V.: Probabilistic seismic hazard analysis of the sites of critical objects. CEUR Workshop Proc. 2534, 413–417 (2020) 8. Douglas, J.: Ground Motion Prediction Equations 1964–2020. Department of Civil and Environmental Engineering University of Strathclyde, Glasgow, United Kingdom (2021) 9. Abrahamson, N.A., Silva, W.J., Kamai, R.: Summary of the ASK14 ground motion relation for active crustal regions. Earthq. Spectra 30(3), 1025–1055 (2014) 10. Campbell, K.W., Bozorgnia, Y.: NGA-West2 ground motion model for the average horizontal components of PGA, PGV, and 5%-damped linear acceleration response spectra. Earthq. Spectra 30(3), 1087–1115 (2014) 11. Chiou, B.S.-J., Youngs, R.R.: Update of the Chiou and Youngs NGA model for the average horizontal component of peak ground motion and response spectra. Earthq. Spectra 30(3), 1117–1153 (2014) 12. OpenQuake – Engine, platform and tools for computing seismic hazard and risk. https://www. globalquakemodel.org/openquake. Last accessed 18 Jan 2021 13. Hosseini, S.M.M.M., Pajouh, M.A., Hosseini, F.M.M.: The limitations of equivalent linear site response analysis considering soil nonlinearity properties. In: International Conferences on Recent Advances in Geotechnical Earthquake Engineering and Soil Dynamics 14, pp. 1–11. Missouri University of Science and Technology (2010) 14. Seed H.B. Idriss I.M.: Soil moduli and damping factors for dynamic response analyses. Report No. EERC 70–10. Earthquake Engineering Research Center, University of California Berkeley, California (1970) 15. Sun, J.I., Golesorkhi, R., Seed, H.B.: Dynamic Moduli and Damping Ratios for Cohesive Soils, UCB/EERC-88/15. Earthquake Engineering Research Center, University of California, Berkeley (1988) 16. Seed, H.B., Wong, R.T., Idriss, I.M., Tokimatsu, K.: Moduli and damping factors for dynamic analysis of cohesionless soils. J. Geotech. Eng. 112(11), 1016–1032 (1986) 17. Toro, G.R., Abrahamson, N.A., Schneider, J.F.: Engineering model of strong ground motions from earthquakes in the central and eastern United States. In: Schneider, J.F. (ed.) Guidelines for Determining Design Basis Ground Motions, EPRI TR-102293. Electric Power Research Institute, Palo Alto, California (1993)
Methods of Interpretation of CT Images with COVID-19 for the Formation of Feature Atlas and Assessment of Pathological Changes in the Lungs Aleksandr Zotin , Anzhelika Kents , Konstantin Simonov , and Yousif Hamad Abstract The paper is devoted to the methods of interpretation and analysis of dynamic changes associated with COVID-19 in CT images of the lungs. An attempt was made to identify possible regularities of the CT pattern and diagnose the possible development of fibrosis in the early stages. To improve the accuracy of diagnosis and prognosis of the formation and development of fibrosis, we propose the creation of the Feature Atlas of CT images with a specific X-ray state. An experimental study within the framework of the formed dataset, divided into 4 groups according to the severity of changes, was carried out. The preliminary results of processing and texture (geometric) analysis of CT images were obtained. The analysis of a series of CT images includes key steps such as preprocessing, segmentation of lung regions and color coding, as well as calculation cumulative assessment of features to highlight areas with probable pathology, combined assessment of features and the formation of the Feature Atlas. We generated the preliminary Feature Atlas for automation and more accurate analysis of the CT images set. As part of the study on the selected groups of patients, the areas with the probabilities of pathologies associated with COVID-19 development were identified. The study shows the dynamics of residual reticular changes.
A. Zotin (B) Reshetnev Siberian State University of Science and Technology, 31 Krasnoyarsky Rabochy Pr, Krasnoyarsk 660037, Russian Federation e-mail: [email protected] A. Kents Federal Siberian Research Clinical Centre, FMBA of Russia, 26 Kolomenskaya st, Krasnoyarsk 660037, Russian Federation e-mail: [email protected] K. Simonov Institute of Computational Modeling of the SB RAS, 50/44 Akademgorodok, Krasnoyarsk 660036, Russian Federation e-mail: [email protected] Y. Hamad Siberian Federal University, 79 Svobodny Pr, Krasnoyarsk 660041, Russian Federation e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_14
173
174
A. Zotin et al.
Keywords CT image · Lung pathologies · COVID-19 · Fibrosis · CT image features · Image analysis · Color-coding
1 Introduction Currently in connection with the development of a pandemic associated with the coronavirus infection one of the key elements of early diagnosis of the infection is the performance of computed tomography (CT) of the lungs. In CT images with lung damage caused by COVID-19, there is a characteristic picture of the ground glass opacity, which is not visualized on the radiograph [1, 2]. In this regard, various screening studies are widely discussed and carried out with the purpose to improve the accuracy of the diagnosis of viral pneumonia caused by COVID-19. The high contagiousness of the disease and its severe clinical course, as well as the increased risk of complications leading to death, determine the study of the phenomenon of COVID-19 as the most urgent problem [3]. Taking into account a specific CT picture with various manifestations of COVID-19, an overall picture of the disease and treatment is formed [2, 4]. Herewith, variants of delayed changes are also possible, the most significant of which is the formation of fibrosis. Our study is devoted to the development of interpretation methods and a scheme for the formation of the Feature Atlas of CT images with pathology, on the basis of which it is supposed to highlight areas for specialists’ attention. In this case, it is supposed to use a computational technique [5–7] to build models of texture analysis and visualization of images of computed tomography of the lungs. As part of the proposed solution, the study is aimed at using a complex set of textural features to assess the appearance of ground glass opacity, including low-dose CT images. To evaluate the data, we used a dataset of the X-ray diagnostic department of the Federal Siberian Research Clinical Centre. Note that its formation continues. During studies aimed to find a specific CT pattern for prediction of fibrosis occurrence in the early stages, the images of patients with confirmed COVID-19 were used. The paper is organized as follows. Section 2 presents the description of CT pattern features for the COVID-19 diagnosis and evaluation of probability of occurrence of fibrosis. Section 3 provides information about the materials used and the processing technique. Section 4 describes the results of the experimental study. Conclusions are given in Sect. 5.
2 Features of the CT Patterns for Diagnosing COVID-19 and Assessing the Development of Fibrosis According to the international recommendations there are the most typical CT patterns in the lungs, on the basis of which we can speak about a high probability of the presence of COVID-19 pneumonia. The most typical sign is the presence of
Methods of Interpretation of CT Images with COVID-19 …
175
a compaction of the lung tissue of the type of ground-glass opacity [2, 4]. There are also possible variants of the process course with the formation of areas of consolidation and reticular changes. At this stage of the pandemic, possible ideas about the formation of fibrotic changes are only being formed. Leading radiologists believe that these changes should regress in the majority of the recovered population. However, the question of time it should take and factors that can contribute to the preservation of reticular changes is the subject of many studies. It is important to observe the formation of fibrosis in the early stages. The formation of the Feature Atlas will assist in the selection of parameters for image processing, taking into account possible CT patterns for different categories of patients. It will also help to perform image analysis with increased accuracy and improve diagnostic solutions. It is important to note that pulmonary fibrosis is a recognized complication of acute respiratory distress syndrome (ARDS). A number of studies have shown that the lungs of patients who survived ARDS have persistent fibrotic changes, which in many cases correlate with a decrease in the quality of life and limited lung physiology [8, 9]. Thille et al. described a cohort of 159 autopsies of patients with ARDS stating that these pathologic findings can either resolve to normal lung parenchyma or progress to fibrosis [10]. In the study [11], 4% of patients with disease duration of less than 1 week, 24% of patients with disease duration from 1 to 3 weeks, and 61% of patients with disease duration of more than 3 weeks developed fibrosis. This description along with the other data confirms that pulmonary fibrosis begins at an early stage of the development of ARDS [12]. Ground-glass opacity is an interstitial type of infiltration of the lung tissue. In the CT images, it looks like a thickening of lung tissue with preservation of visualization of the bronchial and vascular components (Fig. 1). In particular, it is caused by partial filling of air spaces, interstitial thickening (due to the fluid, cells and/or fibrosis). In the textural assessment of these changes, the emphasis is put mainly on the analysis of the homogeneity of features, the prevalence of certain changes, as well as the assessment of the degree of variability in the border areas. During homogeneity evaluation, the correlation of intensity in the analyzed area and in regions with denser changes is taken into account. As an example, the presence or absence of the symptom of “air bronchography” is interpreted as an additional factor of heterogeneity. In this
Fig. 1 Manifestation of ground-glass opacity: a CT image, b structure of cells in a normal lung, c structure of cells in a lung with ground-glass opacity
176
A. Zotin et al.
case, it is necessary to assess the correlation of features in the potential affected area and to give a comparative evaluation of the relationship of the zone with changes to other structures. One should consider the regularity of such changes and their roughness. The indicator of the severity of reticular changes is most applicable to fibrotic changes, since multidirectional coarse thickening zones (interstitium) are determined in the lung tissue. In our research, we decided to use color-coding methods with the selection of the most suitable color scheme. The search for the clearest border of the unchanged parenchyma to the infected one comes to the fore, as well as the emphasis on highlighting the density border of the studied tissues (Fig. 2). In the work, we first of all focused on the change in the pulmonary parenchyma for subsequent dynamic observation and rehabilitation. This was done since in some cases when the patient was discharged with rough residual reticular changes (Fig. 3).
Fig. 2 Examples of color coding: a original image, b maps of parenchyma differences to infected, c maps of density of different tissues
Fig. 3 CT image with signs of rough reticular changes: a original, b enlarged fragment
Methods of Interpretation of CT Images with COVID-19 …
177
As a part of the primary study, the degree of severity and the percentage of the lung parenchyma involvement in the process are set. For the formation of the Atlas of the probabilistic development of pathologies (formation of fibrosis), it is necessary to conduct a study of indicators in dynamics. Evaluation of the identified COVID-19 pneumonia dynamics is carried out according to clinical indications using various imaging methods. During the dynamics description, we relied on the data from the Handbook of COVID-19 Prevention and Treatment: “Patients with massive lesions of the lung tissue should be monitored by a pulmonologist because of the high risk of interstitial pulmonary fibrosis” [2]. In this regard, the key stage in the analysis of CT images is to highlight features in the image with the most accurate assessment of the corresponding geometric indicators (markers).
3 Materials and Processing Methods In order to create the Feature Atlas with a description of the probabilistic development of pathologies, we perform a number of stages. At the first (preparatory) stage, we formed a primary dataset of CT images to assess lung pathologies associated with COVID-19. During the dataset formation, we selected images from patients with confirmed diagnosis dividing them into groups according to the severity of the disease (CT1–CT4). For each patient, sets of CT images were formed, which consider the dynamics of the development of pathologies to analyze changes in the affected area [13]. Within the framework of dynamic observation, a set of images is identified, using certain CT patterns that most clearly characterize these changes. At the same time, a number of model images were formed. During analysis and evaluation of the pathology degree in the images, we use the set of features described in Sect. 2. Herewith, we identified the patterns that were used to form a reference settings models. As part of the second stage, we interpret images in different formats in order to determine the zone of pathological changes and calculate the volume of the lesion. This, in turn, contributes to the quality formation of the Atlas. We use modified technique [14] as a basis for CT image processing, which includes the following stages: • Preprocessing of images. • Segmentation of lung areas and color-coding. • Calculation and aggregate assessment of features to identify areas with suspected pathology. • Comprehensive analysis of a series of images to form the Feature Atlas. During the preprocessing, we perform noise reduction and improve the brightness characteristics of the CT images. Noise reduction is conducted using a combination of spatial and 3D filtering based on weighted median filter. To improve the brightness characteristics, we use local variations of the method based on Retinex technology
178
A. Zotin et al.
Fig. 4 Example of CT image processing: a original, b BCET enhancement, c enhancement by Retinex in log form, d Retinex enhancement with non-logarithmic form
[15], as well as the contrast-limited adaptive histogram equalization (CLAHE) and balance contrast enhancement technique (BCET) methods. The example of some results obtained during preliminary image processing is shown in Fig. 4. During the segmentation stage, we conducted segmentation of the lung regions with use the PCA algorithm, wavelet transform and Otsu’s thresholding method. For the left and right lungs, we conduct independent features calculation. Texture features and features obtained as a result of Shearlet transformation with color-coding used as the main ones [7, 16]. In order to obtain additional indicators for CT images, we utilize various combinations of processing methods. To do this, within the framework of the interpretation of the primary results, we perform the correction and formulation of a more detailed and accurate task taking into account the peculiarities of the images. For example, during the initial processing with the subsequent application of segmentation, the zone of pathological changes is distinguished as denser areas along the tissue of the unaffected area of the lung. At the same time with a more detailed assessment, it is possible to visualize areas with unchanged parenchyma against the background of pathology. So, it is possible to distinguish thinner partitions with the formation of the concept of some reticularity of changes. The example of the pathology zone segmentation used to assess the area of the affected parenchyma is shown in Fig. 5. Initial calculation in this example showed that the lesion was 28% (18% on left lung and 10% on right lung). Taking these calculations into account, further improvement of diagnostics is possible based on the refinement of a number of indicators (markers).
Fig. 5 Example of the pathology zone segmentation: a original CT image, b segmented lung, c marked area of pathology, d visualization of fields used for lesion assessment
Methods of Interpretation of CT Images with COVID-19 …
179
The formation of the Feature Atlas with the fibrosis localization is carried out considering the feature calculations only in the affected areas associated with COVID-19. The Feature Atlas is a description of possible zones of pathological changes with increased tissue density. This data allows to form presets of the algorithms parameters used for the segmentation and visualization of the specified zones. This, in turn, makes it possible to more accurately determine the differentiation of features during clustering. As part of the analysis of images set, the calculation and assessment of the volume as a percentage of the affected area from COVID-19 continues. With the layer-by-layer processing of patterns with images, the lung is divided into zones without and with pathological changes. Note that this pathology has a bilateral central and peripheral location of a polysegmental nature. Thus, within the framework of constructing the Feature Atlas based on the interpretation of the obtained calculated images, we perform the solution of diagnostic medical problems. We tried to distinguish a number of indicators for assessing the severity of the patient’s disease. The interpretation of CT images with COVID-19 in dynamics was carried out for 12 patients with the most significant areas of pathology in the lungs.
4 Results of Experimental Studies In this section, we present a description of an experiment taking into account the demonstration of the calculation results. We form a general characteristic scheme of the development of the disease, which is based on the isolation and description of the stage of the disease. During experimental study, we use a dataset formed for 4 groups of patients (CT1–CT4) with studies in dynamics with subsequent connection for analysis and diagnosis of clinical and laboratory data. Currently, we formed the Feature Atlas based on data of 12 patients, with residual changes in dynamics and with an assessment of the area of lung tissue lesions. During the dynamic assessment at the first stage of the interpretation of the result, the zones are compared with the changes previously identified. At the same time, we form ideas about their change over time in correlation with the stage of the disease. The stages are divided as follows: early stage (0–4 days), stage of progression (5– 8 days), peak stage (9–13 days), stage of resolution >14 days, regression of detected changes. As a result of further assessment, we reveal the appearance of “fresh” or the disappearance of previously identified zones of pathological changes. The detailed experimental study was carried out on two patients. Data on one of them turned out to be more interesting for solving the problem of identifying and assessing the zone of fibrosis. The first patient (CT1 at admission) is young with a rapid regression of changes and discharge with no signs of long-term consequences in the lungs. The second patient (CT3 at admission) is elderly (73 years old), with residual reticular changes. This choice of patients was made specifically due to the pronounced difference in age and the radically different picture of medical features state during discharge. Figure 6 shows an example of an image sequence for patients.
180
A. Zotin et al.
Fig. 6 Example of segmentation of zones of bilateral polysegmental changes in the lungs with formation of density color map that characterizes the degree of lesion
During analyzing of more homogeneous changes in the first patient’s data, the emphasis was placed on the development of the proposed method for localization and interpretation of pathology [17]. The fact that pathological changes progressed was taken into account. Subsequently, these changes regressed in accordance with the stage of the disease. They also did not raise suspicions for delayed residual changes. In the images of the second patient, denser areas are highlighted in red. Delayed changes can form in these zones. The main method for studying the pattern is dynamic observation of the identified pathological changes. In a dynamic study, the previously identified changes markedly regressed. Based on the Feature Atlas, a retrospective analysis of the density characteristics was performed with subsequent comparison with other patients. The revealed features and possible connections with the preservation of residual phenomena were taken into account. Having studied the dynamics of the second patient, we discovered a pronounced positive dynamics in the form of regression of previously identified changes. This case is an illustrative example of the ambiguity of the picture of changes in the dynamics and variants of the disease outcome. During analysis of the dynamics in the region of interest (ROI), it was found out that the density indicators of one zone in same sections can have different values (Table 1). With the norm from –700 to –900 Hounsfield units (HU) [18], we observe Table 1 Density indicators for dynamic research, (HU) Index
Series of CT images 1
Series of CT images 2
Series of CT images 3
Series of CT images 4
Series of CT images 5
Density
−543
−294
20
18
−776
Methods of Interpretation of CT Images with COVID-19 …
181
Table 2 Statistical data on the quantitative assessment of the ROI analysis, (Avg ± SD) Type of processing
Highlighting of pathology boundaries
Determination of the pathology volume
Separation of pathology by density component
Manual by specialist
0.965 ± 0.019
0.971 ± 0.012
0.987 ± 0.014
Semi-automatic
0.977 ± 0.013
0.981 ± 0.014
0.990 ± 0.015
Automatic, using the feature atlas
0.976 ± 0.015
0.980 ± 0.011
0.991 ± 0.014
changes in the type of ground-glass opacity from –250 to –750 HU. It can be hypothesized that fibrosis does not form at this density. With dynamic observation of the density characteristics of pathology zones, a regression of rather dense zones with the restoration of their “airiness” is noted (Table 1). During evaluation of such parameters as the volume and area of the lesion in various conditions by specialists (using the classical approach, using only the method of interpretation with manual adjustment and automation based on the Feature Atlas), there were small discrepancies (Table 2). The preliminary version of the formed atlas showed 75–90% coincidence in the parameters of the algorithms required for the features calculating, within which the pathology and the area of fibrosis are assessed. The conducted experiment shows that the results of the “Automatic” mode calculation with using the Feature Atlas are comparable to the data obtained in the “Semiautomatic” mode. According to the results of the experimental study, it can be concluded that the use of the Feature Atlas can significantly reduce the calculation time.
5 Conclusions The performed calculations made it possible to form a preliminary version of the Feature Atlas. It allows to simplify the selection of parameters when assessing features that affect the final diagnosis. Experimental study demonstrated that on average in 75–90% of cases the parameters between series of images are correlated and they can be used in data processing. The Feature Atlas parameters and the use of a complex features sets allows to automatically obtain values of indicators that allow to form an idea of possible zones of fibrosis and assists its detection at early stages. Based on these results, variants of contrasting the indicated zones with color-coding are proposed depending on the task. The search for the most acceptable gamut was performed for a more indicative display of changes. Results of experimental study on patients’ data demonstrate detection of pathological changes with increased accuracy for the interpretation of studied CT images.
182
A. Zotin et al.
References 1. Tyurin, I.E.: Radiology in the Russian Federation. J. Oncol. Diagn. Radiol. Radiother. 1(4), 43–51 (2018). (in Russian) 2. Handbook of COVID-19 prevention and treatment. The First Affiliated Hospital, Zhejiang University School of Medicine (2020) 3. Rodriguez-Morales, A.J., Cardona-Ospina, J.A., Gutiérrez-Ocampo, E., Villamizar-Peña, R., Holguin-Rivera, Y., Escalera-Antezana, J.P., Alvarado-Arnez, L.E., Bonilla-Aldana, D.K., Franco-Paredes, C., Henao-Martinez, A.F., Paniz-Mondolfi, A., Lagos-Grisales, G.J., RamírezVallejo, E., Suárez, J.A., Zambrano, L.I., Villamil-Gómez, W.E., Balbin-Ramon, G.J., Rabaan, A.A., Harapan, H., Dhama, K., Nishiura, H., Kataoka, H., Ahmad, T., Sah, R.: Clinical, laboratory and imaging features of COVID-19: a systematic review and meta-analysis. Travel Med. Infect. Dis. 34(101623), 1–13 (2020). https://doi.org/10.1016/j.tmaid.2020.101623 4. Sinitsyn, V.E., Tyurin, I.E., Mitkov, V.V.: Consensus guidelines of Russian society of radiology (RSR) and Russian association of specialists in ultrasound diagnostics in medicine (RASUDM) «Role of imaging (X-ray, CT and US) in diagnosis of COVID-19 Pneumonia» (version 2). J. Radiol. Nucl. Med. 101(2), 72–89 (2020). https://doi.org/10.20862/0042-4676-2020-101-272-89. (in Russian) 5. Zotin, A., Hamad, Y., Simonov, K., Kurako, M.: Lung boundary detection for chest X-ray images classification based on GLCM and probabilistic neural networks. Procedia Comput. Sci. 159, 1439–1448 (2019) 6. Zotin, A., Simonov, K., Kapsargin, F., Cherepanova, T., Kruglyakov, A., Cadena, L.: Techniques for medical images processing using shearlet transform and color coding. In: Favorskaya, M., Jain, L. (eds.) Computer Vision in Control Systems-4. ISRL, vol. 136, pp. 223–259. Springer, Cham (2018) 7. Zotin, A., Simonov, K., Kapsargin, F., Cherepanova, T., Kruglyakov, A.: Tissue germination evaluation on implants based on shearlet transform and color coding. In: Favorskaya, M., Jain, L. (eds.) Computer Vision in Advanced Control Systems-5. ISRL, vol. 175, pp. 265–294. Springer, Cham (2020) 8. Burnham, E.L., Janssen, W.J., Riches, D.W.H., Moss, M., Downey, G.P.: The fibroproliferative response in acute respiratory distress syndrome: mechanisms and clinical significance. Eur. Respir. J. 43, 276–285 (2014) 9. Cardinal-Fernández, P., Lorente, J.A., Ballén-Barragán, A., Matute-Bello, G.: Acute respiratory distress syndrome and diffuse alveolar damage. New insights on a complex relationship. Ann. Am. Thorac. Soc. 14, 844–850 (2017) 10. Thille, A.W., Esteban, A., Fernández-Segoviano, P., Rodriguez, J.M., Aramburu, J.A., VargasErrázuriz, P., Martín-Pellicer, A., Lorente, J.A., Frutos-Vivar, F.: Chronology of histological lesions in acute respiratory distress syndrome with diffuse alveolar damage: a prospective cohort study of clinical autopsies. Lancet Respir. Med. 1(5), 395–401 (2013) 11. Wu, C., Chen, X., Cai, Y., Xia, J., Zhou, X., Xu, S., Huang, H., Zhang, L., Zhou, X., Du, C., et al.: Risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease 2019 pneumonia in Wuhan, China. JAMA Intern Med. 180(7), 934–943 (2019). https://doi.org/10.1001/jamainternmed.2020.0994 12. Mo, X., Jian, W., Su, Z., Chen, M., Peng, H., Peng, P., Lei, C., Li, S., Chen, R. and Zhong, N.: Abnormal pulmonary function in COVID-19 patients at time of hospital discharge. European Respiratory Journal 55(6), 2001217.1–11 (2020) 13. Li, Y., Xia, L.: Coronavirus disease 2019 (COVID-19): role of chest CT in diagnosis and management. Am. J. Roentgenol. 214, 1280–1286 (2020) 14. Zotin, A., Hamad, Y., Smonov, K., Kurako, M., Kents, A.: Processing of CT lung images as a part of radiomic. In: Czarnowski, I., Howlett, R., Jain L. (eds.) Intelligent Decision Technologies, IDT 2020. Smart Innovation, Systems and Technologies, vol. 193, pp. 243–252 (2020) 15. Zotin, A.G.: Fast algorithm of image enhancement based on multi-scale retinex. Int. J. Reasoning-based Intell. Syst. 12(2), 106–116 (2020)
Methods of Interpretation of CT Images with COVID-19 …
183
16. Yue, Y., Shi, Z., Zhang, Z.: Image edge detection algorithm based on shearlet transform. J. Comput. Appl. Softw. 31(4), 227–230 (2014) 17. Kents, A., Hamad, Y., Simonov, K., Zotin, A., Kurako, M.: Geometric analysis of pathological changes in the lungs using CT images for COVID-19 diagnosis. CEUR Workshop Proc. 2727, 43–50 (2020) 18. Funama, Y., Awai, K., Liu, D., Oda, S., Yanaga, Y., Nakaura, T., Kawanaka, K., Shimamura, M., Yamashita, Y.: Detection of nodules showing ground-glass opacity in the lungs at low-dose multidetector computed tomography: phantom and clinical study. J. Comput. Assist. Tomogr. 33(1), 49–53 (2009)
Multilevel Watermarking Scheme Based on Pseudo-barcoding for Handheld Mobile Devices Margarita N. Favorskaya
and Alexandr V. Proskurin
Abstract Multimedia content provided by smartphones, laptops and tablets equipped by digital cameras presents a huge amount of images, audio and video clips in our daily life. Implementation of watermarking schemes for copyright protection on mobile devices remains an open issue due to specific features of mobile hardware. In this paper, we present a robust watermarking scheme for embedding textual information such as GPS coordinates in images shared via social media either in default mode or as the owner intends. The proposed algorithm is an extension of barcoding technique for binary notation called pseudo-barcoding. In addition, multilevel watermarking scheme realizes embedding of a fragile logo with customizable visibility and robust textual watermark embedding in a still image or frames of a video sequence. The implementation of the proposed algorithm is optimized for handheld mobile devices. Keywords Watermarking scheme · Barcoding · Mobile device · Multimedia content · Authentication · Copyright protection
1 Introduction At present, multimedia content sharing using mobile devices is a conventional procedure for the vast majority of people that provokes an increased number of unauthorized attacks with various purposes during transmission over the Internet. One of reasonable ways to achieve authentication, copyright protection and integrity of multimedia data is to use the digital watermarking technologies, which have been highly enforced in recent decades [1]. Mobile watermarking schemes can be implemented on HTTP server, and in this case we can apply the conventional watermarking M. N. Favorskaya (B) · A. V. Proskurin Reshetnev Siberian State University of Science and Technology, 31 Krasnoyarsky Rabochy ave., Krasnoyarsk 660037, Russian Federation e-mail: [email protected] A. V. Proskurin e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_15
185
186
M. N. Favorskaya and A. V. Proskurin
procedures without constrains. A more acceptable case for end users is to embed and detect watermarks using only smartphones’ resources. Generally, mobile recourses are not usually sufficient to implement the conventional watermarking schemes. This means compromising on battery life, computing power and power consumption. Mobile computing limitations are among the main disadvantages of smartphones, laptops and tablets. Despite these limitations, parameters such as imperceptibility, robustness, capacity and security must be supported in mobile watermarking schemes. An example of a successful commercial application was provided by Digimarc Corporation [2], which has developed multi-layered brand protection strategies for imperceptible digital identities for products and packages, e-book content protection, among others. It is assumed that an honest user contacts the copyright owner to request permission to use the multimedia content. Watermarking schemes on mobile devices can execute some of common applications, but at that time generate new applications such as invisible GPS tagging labeling of multimedia content in combination with other metadata (date/time specification or user logo). In this paper, we propose a multilevel protection of images and video sequences received from smartphone’s camera, relatively simple and at the same time robust to some intentional and accidental Internet attacks and with the possibility to reconstruct a distorted textual watermark. In our previous research, we developed a robust watermarking scheme for embedding textual watermarks using Code-128 and Digital Wavelet Transform (DWT) [3]. Reliable reconstruction of a watermark represented by barcodes was achieved by restoration of vertical strips, white and black, as the simplest geometric primitives. However, we found that the character set encoded by Code-128 standard is limited. The proposed algorithm doubles a number of available characters using an additional simplest transform such as character → binary code → pseudo-barcodes. This approach allows the use of national character encoding. We also adapt this approach to embed service information such as GPS coordinates, date, time, number of frame and so on. We are expanding the functionality of the algorithm adding an ability to use a fragile watermark as a logo, which also indicates the presence of Internet attacks when the user adjusts the degree of its visibility. The remainder of the paper is structured as follows. A review of mobile watermarking schemes is provided in Sect. 2. The proposed multilevel protection of image and videos based on discrete hadamard transform (DHT) and pseudo-barcoding and DWT is described in Sect. 3. The experimental results are given in Sect. 4. Finally, conclusions are drawn in Sect. 5.
2 Related Work In mobile devices, the security problems have two main aspects: The watermarking techniques are applied to authenticate cell phones or to send multimedia content, while secure delivery in the form of text, graphics, voice and video content called mobile e-commerce are supported by cryptography and watermarking techniques.
Multilevel Watermarking Scheme Based on Pseudo-barcoding …
187
Mobile communications using 5G-wireless technology provide the ability to address complex technical issues of both aspects of security. Watermarking techniques serve to authenticate customers for service delivery and customer support. However, handheld devices have limited battery life and limited computational resources that are directly affected by the amount of computational load generated by an application. A more complex watermarking algorithm means more expensive computational costs. Conflict is that a simple watermarking algorithm, for example based on least significant bit (LSB) method, will not protect mobile multimedia content from typical Internet attacks, even JPEG/MPEG quantization. Due to the computationally expensive watermarking process and the rapid depletion of available power on handheld devices which existed at that time, the proposal to migrate some tasks to a proxy server appeared in the 2000s. In [4], the partitioning scheme for wavelet-based watermarking algorithms was performed on a proxy server that acted as an agent between the content server and the handheld device. In this architecture, content servers storing multimedia and database content were included. All communication between the mobile devices and the content servers were relayed through proxy servers. Note that the modern mobile devices can create and store multimedia data by themselves with following sending to other mobile devices through proxy servers. An algorithm for watermarking of color images enhanced by security aspect was presented in [5]. First, the phone number digits were converted using binary-coded decimal (BCD) encoder, and the generated binary vector was appended by the phone number checksum. Second, this BCH coded binary vector was inserted into image blocks using discrete cosine transform (DCT). Quick response (QR) codes were used as the watermarks in [6]. The security was improved through QR codes scrambling before being embedding in the DWT domain. The authors claimed that their algorithm was robust to JPEG compression (compression ratio 60–70%), noise attacks (with noisy image quality of 15–35 dB), image cropping, object replacement, etc. Video watermarking techniques for mobile devices is a poorly studied area. One of possible approach, when a watermark is embedded into the bit-stream domain, was proposed in [7]. This method was performed without signal re-encoding, which enabled efficient storage and delivery of audio-visual content over the Internet. There are not many commercial software tools. We can mention Stirmark Benchmark 4.0 [8]. However, there are some experimental software tools such as “HymnMark” [9] and “Android Digital Visible Watermarking” [10]. As shown from the brief overview, this branch of watermarking techniques has been actively developing since 2010s. However, the hardware limitations remain, and this causes a necessity of sustained attention.
3 The Proposed Multilevel Protection According to the concept of multilevel protection [11, 12], each level is characterized by encryption or watermarking approaches with a special type of watermark,
188
M. N. Favorskaya and A. V. Proskurin
Table 1 Short description of protection levels Level No.
Encryption/watermarking
Recommended transformation
Transform
Level 4
+ /−
Encryption coding
Watermarked image 2 → Encoded stream
Level 3
−/ +
DHT, LSB
Watermarked image 1 + Fragile watermark → Watermarked image 2
Level 2
−/ +
DCT, DFT, DWT/Koch-Zhao scheme
Image + Robust watermark → Watermarked image 1
Level 1
+ /−
Recoding and/or scrambling
Watermark → Robust watermark
embedding scheme in spatial/transform domain and coding properties. A practical scenario of multimedia protection in handheld mobile devices is chosen under criteria of robustness, capacity/payload, security and computational cost. Thus, we have a typical constrained optimization problem. Table 1 provides a brief description of protection levels. Here, DFT is discrete fourier transform. Hereinafter, consider in detail Level 1 (Sect. 3.1) and Level 3 (Sect. 3.2) as the main contribution of this research. Other issues at Levels 1 and 2 regarding the watermark as a short textual message are described in [13]. Note that Level 1 and Level 4 are optional procedures. Section 3.3 contains a description of multilevel embedding and extraction schemes.
3.1 Pseudo-barcoding As well known, a textual watermark can be represented in the form of binary codes and as an image. Binary codes are very suitable for embedding in spatial domain using the LSB method. This approach provides the best payload values. However, any type of attack or JPEG compression easily distorts the binary code sequence, partially or fully, and it is impossible to restore the extracted watermark using blind watermarking techniques. On the contrary, the image representation has relatively lesser payload but is more robust for attacks and JPEG compression due to being embedded in transform domain. At that time, the robustness of attacks remains unacceptable for legal users. Suppose that the short textual message includes the following attributes: Latitude (LT), Longitude (LN), Date (DT), Time (TM), Account (AC) and Number of frame (for video watermarking scheme) (NF). Figures 1 and 2 depict examples of extracted textual watermarks after typical Internet attacks.
Multilevel Watermarking Scheme Based on Pseudo-barcoding …
a
b
c
189
d
e
Fig. 1 Example of textual watermark in the form of binary codes: a original watermark, b a part of binary representation (each bit is duplicated in a block 8 × 8 pixels), c extracted watermark after JPEG compression with quality level 70%, d extracted watermark after GridCropping 20%, e extracted watermark after Brightness 15% (symbol means the unrecognized symbol)
a
b
c
d
e
Fig. 2 Example of textual watermark in the form of image: a original watermark, b duplicated image representation, c extracted watermark after JPEG compression with quality level 70%, d extracted watermark after GridCropping 20%, e extracted watermark after Brightness 15%
We decided to combine both approaches in order to find a good solution between payload and robustness. We introduce a textual watermark as an image with only binary codes, moreover representing the binary codes as the spaces and printed parallel strips—pseudo-barcodes. We do not need to use three bars and three spaces for each symbol (according to Code-128 A, B and C) because we only apply for coding two digits, 0 and 1, and can shrink a length of barcode template. Experiments show that for robust reconstruction a width of strip must be at least 2 pixels. Each symbol represented in Windows-1251 codes is a sequence of 8 bits. The set of symbols forms a pseudo-barcode, which contains 6 sections: • • • • • •
Null symbol (all bits are 0). 1 if a length of the textual string is in the range [0…255]. Number of symbols in a textual string, bytes. Symbols in a textual string, bytes. Check symbol. Null symbol (all bits are 0).
The remainder of the integer division of the sum by 256 is used as a check symbol. The sum is calculated as follows. Each byte value, except the null symbol, is multiplied by its position number in the string, and the obtained results are summarized. After the formation of all sections, each bit is duplicated, and the resulting sequence is repeated as many times as possible but at least 8 times depending on the resolution of the host image. Thus, each strip and each symbol with a height 8 pixels have the sizes 2 × 8 pixels and 16 × 8 pixels, respectively. The textual watermark encoded in such way is divided into 1024 bits sequences, which are interpreted as blocks of 32 × 32 pixels, for scrambling using the reversible
190
M. N. Favorskaya and A. V. Proskurin
a b c d Fig. 3 Comparative examples of textual watermark: a Code-128 representation, b Arnold’s transform of Code-128, c pseudo-barcoding representation, d Arnold’s transform of pseudo-barcoding
Arnold’s transform. Each block is transformed with a different number of iterations using MD5 hash-function applied to secret key. Comparative examples of textual watermark with different types of recoding are depicted in Fig. 3. As one can see, the pseudo-barcode is more compact compared to the Code-128 representation and at the same time has a larger numbers of encoded characters.
3.2 Fragile Watermark Embedding fragile watermark refers to Level 3 from Table 1. Its main purpose is to show whether attacks have been applied to a watermarked image or not. An additional purpose is to demonstrate authorization if a fragile watermark is visible. The simplest DHT transform is usually used for embedding fragile watermark [14]. A general mathematical model for embedding/extraction fragile watermark with scaling factor α as the degree of visibility is written in the form of Eqs. 1 and 2, respectively, where I is the host image, I is the watermarked image, W is the fragile image watermark. I = I + αW
(1)
W = I − I /α
(2)
Equations 1 and 2 are applied to luminance components Y of the host image and fragile watermark, converted from RGB to YCbCr color space and after DHT. Scaling factor α is determined by any function of the sigmoid family. In [15], it was shown that the arctg function is smoother and has more uniform lifting verses sigmoid function that allows to make more accurate adjustment of the degree of visibility. Thus, a scaling factor α is calculated as: α=
π 1 1 arctg μ I + · m, π 2 10
(3)
Multilevel Watermarking Scheme Based on Pseudo-barcoding …
191
where m is the controlling parameter, μ(I ) is the mean value defined by Eq. 4, M and N are the sizes of a host image. μ I =
M N 1 I (i, j) M · N i=1 j=1
(4)
If m = 0, then a watermark prevails in the host image. If m = 1, then the visibility of the watermark is good without destroying the host image (visible watermarking scheme). If m = 2, then the watermark become invisible without degrading the quality of the host image (invisible watermarking scheme).
3.3 Multilevel Embedding and Extraction Schemes The embedding and extraction schemes are as follows. Embedding of two different watermarks, fragile and hidden, can be done sequentially or in parallel. We present a sequential way for better understanding. Let us consider the embedding scheme. Input data: host image or set of frames, fragile watermark, textual watermark and secret word for the Arnold’s transform. Step 1. Convert the host image/frame and fragile watermark from RGB to YCbCr color space. Step 2. Transform the textual watermark using the proposed pseudo-barcoding. Apply the Arnold’s transform (Sect. 3.1). Define the number of iterations of the Arnold’s transform for each block with a secret key. Step 3. Divide the host image/frame into two regions, for embedding fragile and hidden watermarks, called fragile region and textual region, respectively. Put the coordinates of these regions in the secret key. Each region is represented by a set of 8 × 8 patches, and further the required transform is applied to each patch in the cycle. Step 4. Embed the fragile watermark. Step 4.1. Apply direct DHT to the fragile region and fragile watermark. Step 4.2. Calculate the scaling factor using Eq. 3. Step 4.3. Embed the fragile watermark into the fragile region of the host image/frame using frequency components (Eq. 1). Step 4.4. Apply inverse DHT for the fragile region. Step 5. Embed the textual watermark. Step 5.1. Apply direct 2-level DWT. Step 5.2. Embed textual watermark using a modified Koch–Zhao algorithm [3]. The original Koch–Zhao algorithm is represented in [16]. Step 5.3. Apply inverse DWT.
192
M. N. Favorskaya and A. V. Proskurin
Step 6. Compose the watermarked image/frame. Step 7. Convert the watermarked image/frame from YCbCr to RGB color space. Step 8. Apply Steps 1–7 to each frame in the case of a video sequence. Output data: the watermarked image/watermarked frames. The extraction scheme involves the following steps. Input data: the transmitted watermarked image/watermarked frames and secret key. Step 1. Convert the watermarked image/watermarked frame from RGB to YCbCr color space. Step 2. Extract the fragile watermark. Step 2.1. Apply direct DHT for the fragile region using information from the secret key. Step 2.2. Extract the fragile watermark from the fragile region using frequency components and scaling factor (Eq. 2). Step 2.3. Apply inverse DHT for the extracted fragile watermark. Step 3. If the fragile watermark is distorted, then compensate common image processing and geometric attacks in the image/frame [17]. Step 4. Extract the textual watermark. Step 4.1. Apply direct DWT and extract the watermark bits from frequency areas HL1, HH1 and LH1. Step 4.2. Create a complete representation of the pseudo-barcode considering the number of repetition in the image/frame. Step 4.3. Transform the extracted textual watermark in blocks, apply the Arnold’s transform using the required number of iterations, improve vertical strips and decode using Windows-1251 codes. Step 4.4. Apply inverse DWT to the extracted textual watermark. Step 5. Convert the extracted fragile and textual watermarks from YCbCr to RGB color space. Step 6. Apply Steps 1–5 to each frame in the case of a video sequence. Output data: extracted fragile and textual watermarks.
4 Experimental Results For the experiments, we used own dataset obtained by smartphone’s camera. The shooting was conducted in urban and rural territories. Examples of images of the used dataset are depicted in Fig. 4. Dataset includes images with high resolution (3840 × 2160 pixels) and latitude, longitude, date, time and alias information. We have estimated the amount of textual information that can be embedded in images of this resolution. Using 2-level DWT decomposition, the maximum volume
Multilevel Watermarking Scheme Based on Pseudo-barcoding …
a
b
c
193
d
Fig. 4 Examples of dataset images: a Forest_3, b Street_67, c City_7, d Yard_23
of embedded information achieves 129,600 bits, which can be represented as an image with sizes 480 × 270 pixels. Our pseudo-barcode has 5 or 6 service symbols whereas each symbol occupies 16 × 8 pixels that allows to embed 1004 or 1005 symbols (excluding the fragile watermark), respectively. In order to estimate the robustness of the proposed scheme, the following types of attacks were simulated: • JPEG compression with quality levels 90, 80, 70, 60 and 50%. • Grid cropping of random patches of 16 × 16 pixels with a total amount 5, 10, 15, 20, 25 and 30%, depending on the total image size. • Brightness attack with the brightness changes −15, −10, −5, 5, 10 and 15% from the original brightness values. • Noise attacks adding random white (Gaussian) noise into the watermarked image with changes of color components in the range [−30, 30] for 5, 10 and 15% of the image size. Examples of attack simulation on image Forest_3 (this image was also used in Figs. 1 and 2) are depicted in Fig. 5. During the experiments, the Bit Error Rate (BER) and Pearson Correlation Coefficient (PCC) metrics were measured. The worst values (BER = 0.248, PCC = 0.724) were obtained for the JPEG 50% attack, but even in this case textual message was completely reconstructed.
5 Conclusions In this study, we propose pseudo-barcoding to represent a textual watermark in order to reduce the computational resources at an algorithmic level. We have chosen the simple multilevel embedding/extraction schemes that are robust to attacks. The proposed pseudo-barcodes make it possible to expand the set of coded national symbols. In addition, the proposed method allows to embed the textual watermarks in Unicode with reducing the available amount of the embedded symbols by half. The experiments show the high robustness to many types of Internet attacks with completely reconstructed textual watermarks.
194
M. N. Favorskaya and A. V. Proskurin
a
b
c
d
e
f
g
h
i
j
k
l
Fig. 5 Examples of attack simulation on image Forest_3 and extracted fragile and textual watermarks: a attacked image by JPEG 70%, b fragile watermark, c textual watermark, d attacked image by GridCropping 20%, e fragile watermark, f textual watermark, g attacked image by Brightness 15%, h fragile watermark, i textual watermark, j attacked image by Noise 15%, k fragile watermark, l textual watermark Acknowledgements The reported study was funded by the Russian Fund for Basic Researches according to the research project no. 19-07-00047.
References 1. Shih, F.Y.: Digital Watermarking and Steganography: Fundamentals and Techniques, 2nd edn. CRC Press (2020) 2. Digimarc: https://www.digimarc.com/. Last accessed 2020/11/15 3. Favorskaya, M., Zotin, A.: Robust textual watermarking for high resolution videos based on Code-128 barcoding and DWT. Procedia Comput. Sci. 176, 1261–1270 (2020) 4. Kejariwal, A., Gupta, S., Nicolau, A., Dutt, N.D., Gupta, R.K.: Proxy-based task partitioning of watermarking algorithms for reducing energy consumption in mobile devices. In: 41st Annual Design Automation Conference, pp. 556–561. ACM, San Diego, CA, USA (2004)
Multilevel Watermarking Scheme Based on Pseudo-barcoding …
195
5. Jeedella, J., Al-Ahmad, H.: An algorithm for watermarking mobile phone color images using BCH code. In: 2011 IEEE GCC Conference and Exhibition, pp. 303–306. IEEE, Dubai, United Arab Emirates (2011) 6. Al-Otum, H., Al-Shalabi, N.E.: Copyright protection of color images for Android-based smartphones using watermarking with quick-response code. Multim. Tools Appl. 77, 15625–15655 (2018) 7. Venugopala, P.S., Sarojadevi, H., Chiplunkar, N.N.: An approach to embed image in video as watermark using a mobile device. Sustain Comput: Inf Syst 15, 82–87 (2017) 8. The information hiding homepage: https://www.petitcolas.net/watermarking/stirmark/. Last accessed 2020/12/05 9. Miao, N., He, Y., Dong, J.: HymnMark: Towards efficient digital watermarking on Android smartphones. In: Proceedings of the International Conference on Wireless Networks, pp. 348– 356. IEEE Computer Society, Washington, DC, USA (2012) 10. Pizzolante, R., Carpentieri, B.: Copyright protection for digital images on portable devices. Comput. Inf. 35, 1189–1209 (2016) 11. Favorskaya, M.N., Jain, L.C., Savchina, E.I.: Perceptually tuned watermarking using nonsubsampled shearlet transform. In: Favorskaya MN, Jain LC (eds.) Computer Vision in Control Systems-3, ISRL, vol. 136, pp. 41–69. Springer International Publishing Switzerland (2018) 12. Favorskaya, M.N., Buryachenko, V.V.: Authentication and copyright protection of videos under transmitting specifications. In: Favorskaya, M.N., Jain, L.C. (eds.) Computer Vision in Control Systems–5, ISRL, vol. 175, pp. 119–160. Springer, Cham (2020) 13. Zotin, A., Favorskaya, M., Proskurin, A., Pakhirka, A.: Study of digital textual watermarking distortions under Internet attacks in high resolution videos. Procedia Comput. Sci. 176, 1633– 1642 (2020) 14. Santhi, V., Arulmozhivarman, P.: Hadamard transform based adaptive visible/invisible watermarking scheme for digital images. J. Inf. Secur. Appl. 18(4), 167–179 (2013) 15. Favorskaya, M., Savchina, E., Popov, A.: Adaptive visible image watermarking based on Hadamard transform. IOP Conf. Ser.: Mater. Sci. Eng. 450(5), 052003.1–052003.6 (2018) 16. Koch, E., Zhao, J.: Towards robust and hidden image copyright labeling. In: IEEE Workshop on Nonlinear Signal and Image Processing, pp. 452–455. IEEE, Neos Marmaras, Greece (1995) 17. Favorskaya, M., Savchina, E., Gusev, K.: Feature-based synchronization correction for multilevel watermarking of medical images. Procedia Comput. Sci. 159, 1267–1276 (2019)
Multi-quasi-periodic Cylindrical and Circular Images Victor R. Krasheninnikov, Yuliya E. Kuvayskova, and Alexey U. Subbotin
Abstract Nowadays, image processing problems are becoming increasingly important due to development of the aerospace Earth monitoring systems, medical devices, etc. But most of the image processing works deal with images defined on rectangular two-dimensional grids or grids of higher dimension. In some practical situations, images are set on a cylinder or on a circle. The peculiarity of such images requires its consideration in their models. In the early works of the authors, several models of such images were proposed. These models can also describe quasi-periodic processes. However, these models are single-period ones. Some images and processes can have several quasi-periods. For example, seasonal within annuals, annual rings on a tree cut, etc. In the present paper, autoregressive models of multi-quasi-periodic images and processes are considered. The expressions of their covariance functions are given. Keywords Cylindrical and circular images · Quasi-periodic process · Autoregressive model · Correlation function · Multi-periodic
1 Introduction In recent decades, numerous studies have been carried out on image modeling and processing, for example, [1–3]. This is due to numerous applications in environmental monitoring of the Earth, navigation, computer vision, medicine, etc. In this regard, the problems of identification [4, 5], forecasting and filtering [6–8], estimation of object movement [9], voice commands recognition [10], etc. are considered. The overwhelming majority of known image models is defined on rectangular grids. In some practical situations, images are set on a cylinder (pipelines, blood vessel, etc.) or on a circle (facies of dried biological fluids, an eye, cut of a tree trunk, flower, etc.). These images require special models, which is necessary for the formulation and solution of problems of processing such images. Several models of such images were proposed in [11–13]. These models can also describe quasi-periodic processes, V. R. Krasheninnikov (B) · Y. E. Kuvayskova · A. U. Subbotin Ulyanovsk State Technical University, 32, Severny Venetz str., Ulyanovsk 432027, Russian Federation © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_16
197
198
V. R. Krasheninnikov et al.
in particular, speech sounds [14]. In addition, the use of these models makes it possible to obtain more accurate predictions of the state of technical objects [15] subjected to various kinds of vibrations (engines, turbines, etc.). However, these models can take into account only one quasi-period. Some images and processes can have several quasi-periods. For example, seasonal within annuals, annual rings on a tree cut, etc. In the present paper, autoregressive models of multi-periodic images and processes are considered. Cylindrical images are presented using autoregression along a cylindrical spiral. Circular images are determined by autoregression along the Archimedes spiral. The multi-periodicity in these models is achieved by complicating the autoregressive equation or weighted summation of single-period processes. The expressions of covariance functions of these images and processes are given.
2 The Models of Cylindrical Images with One Quasi-period We first consider the well-known autoregressive Habibi model (Eq. 1) of a flat image [6], where i is the number of the row; j is the number of the column, ξi j are independent standard random variables. xi j = a xi, j−1 + b xi−1, j − ab xi−1, j−1 + q ξi j
(1)
The generated image has zero mean and the covariance function (CF) given by Eq. 2, the graph of which is shown in Fig. 1. V (m, n) = M[xi j xi+m, j+n ] =
q 2 a |m| b|n| (1 − a 2 )(1 − b2 )
(2)
Correlation of image elements decreases along rows and columns. Therefore, the elements of this image located at the beginning and end of the row are weakly dependent. When connecting the rectangular image into the cylinder, there will be a sharp jumps in brightness at the junction, that is not characteristic of the image on a cylinder. The adjacent rows of the rectangular image with b ≈ 1 have a high correlation. Therefore, when combining the rows into a sequence, we can obtain a model of a quasi-periodic process. However, the beginning and end of each row, being at a considerable distance from each other, are practically independent of each other, so there will be sharp jumps at the junction of the quasi-periods of the process, which are unusual for relatively smooth processes. Thus, rectangular images do not
Fig. 1 The CF graph of the model in Eq. 1
Multi-quasi-periodic Cylindrical and Circular Images
199
Fig. 2 The grids on a cylinder: a spiral grid, b circular grid
give acceptable representations of cylindrical images and quasi-periodic processes. In this paper, for this purpose, we use images defined on a cylinder, the values of which along the spiral do not have undesirable sharp jumps. Let us consider a spiral grid on the cylinder (Fig. 2a). Rows of this grid are turns of a cylindrical spiral. The turns of this image can also be considered as closed circles on the cylinder with the same numbering (Fig. 2b). To describe the image defined on a cylindrical grid, we use an analog [14] of the autoregressive model in Eq. 1, where i is a spiral turn number and j is a node number ( j = 0, . . . , T − 1) in the turn, T is the period, i.e., the number of points in one turn. This model can be represented as a scan of the image along a spiral (Eq. 3), where n = kT + l is end-to-end image point number. xn = a xn−1 + b xn−T − ab xn−T −1 + q ξn
(3)
The characteristic equation of this model is given by Eq. 4, where z is a complex variable [16]. (z − a)(z T − b) = 0
(4)
The CF of this model has the form of Eq. 5, where z k are the simple roots of Eq. 4 and ck are some constants depending on the joint distribution of the first values x1 , x2 , . . . , x T +1 . V (n) = M[xi xi+n ] =
T +1
ck z kn
(5)
k=1
In the stationary case (or for large enough i values in Eq. 5), the CF can be calculated according to Eq. 6. V (n) =
q2 2πi
|z|=1
z n−1 dz (z − a)(z T − b)(z −1 − a)(z −T − b)
(6)
200
V. R. Krasheninnikov et al.
Let us represent Eq. 6 in the form of Eq. 7. q2 V (n) = 2πi
|z|=1
z n+T dz (z − a)(z T − b)(1 − az)(1 − bz T )
(7)
The integrand in Eq. 7 has singular points z k = b1/T exp (i 2π k/T ) and s = a T inside the contour of integration |z| = 1. Using residues [17], we obtain V (n) = q
2
T −1 1 zk s n n z + a . (1 − b2 )T k=0 (1 − az k )(z k − a) k (1 − a 2 )(1 − bs)(s − b) (8)
In particular, V (mT ) =
q2 (1 − s 2 )bm+1 − (1 − b2 )s m+1 (9) 2 2 (1 − a )(1 − b )(1 − bs)(b − s)
and the variance σx2 = V (0) =
q 2 (1 + bs) . (1 − a 2 )(1 − b2 )(1 − bs)
(10)
The typical graph of such CF is shown in Fig. 3. The characteristic feature of this CF is continuity at the junction of periods, in contrast to Fig. 1. The image in Fig. 4a is also continuous along the cut line, which is noticeable in the first few columns (Fig. 4b) attached to this image. As a result, the process described by model in Eq. 4, that is, the scan of the cylindrical image along a spiral, does not have sharp jumps at the junction of periods (Fig. 5). The quasi-periods are the rows of image in Fig. 4a. The examples of simulating cylindrical images at various values of this model parameters are shown in Fig. 6.
Fig. 3 The graph of CF in Eq. 7
Multi-quasi-periodic Cylindrical and Circular Images
201
Fig. 4 Cylindrical image representation: a section of a cylindrical image, b the first columns
Fig. 5 Schedule of a simulation of a process by the model in Eq. 3
Fig. 6 The examples of simulating cylindrical images at various values of model parameters: a a = 0.95, b = 0.99, b a = 0.99, b = 0.95, c a = b = 0.95
3 The Models of Circular Images with One Quasi-period A polar coordinate system (r, ϕ) is convenient for circular images representation. To do this, we will consider the turns of the cylindrical spiral of model in Eq. 3 as turns of a circle. In other words, index i is converted into a polar radius, and index j into a polar angle. Thus, the value xi j in the pixel (i, j) of the cylindrical image is converted to the pixel (ir, jϕ) of the circular image (Fig. 7a). It is also convenient to use Archimedes spiral grid (Fig. 7b), similar to the cylindrical spiral in Fig. 2a. The parameters a and b of model in Eq. 3 set the degree of correlation in the radial and circular directions. If the autoregressive coefficient a of the previous pixel is relatively large, then the image will have a high circular correlation (Fig. 8). If the autoregressive coefficients b from the previous turn of the spiral prevail, the image
202
V. R. Krasheninnikov et al.
Fig. 7 Grids on a circle: a circular, b spiral
Fig. 8 The image with circular correlation
will be more correlated in the radial direction (Fig. 9). The quasi-periods are the turns of images in these Figures. Fig. 9 The image with radial correlation
Multi-quasi-periodic Cylindrical and Circular Images
203
4 The Models of Cylindrical and Circular Images with Several Quasi-periods Some images and processes can have several quasi-periods. For example, daily temperature fluctuations within annuals, annelids, annual rings on a tree cut, etc. The process defined by Eq. 3 has quasi-period T in view of factor (z T − b) in Eq. 4. It is obvious that adding such factors to this equation will give new quasi-periods. Let us add one factor (z C − c), then we obtain the characteristic Eq. 11 with the corresponding autoregressive Eq. 12. (z − a)(z T − b)(z C − c) = 0
(11)
xn = a xn−1 + b xn−T + c xn−C − ab xn−T −1 − ac xn−C−1 − bc xn−T −C + abc xn−T −C−1 + q ξn
(12)
If the roots of Eq. 11 are simple, then the CF has the form of Eq. 5 and expression of Eq. 13, where u k = b1/T exp(i 2π k/T ) and vk = c1/C exp(i 2π k/C). If T = mC, where m is an integral number, then m small quasi-periods are inside large ones. A typical CF graph at m = 3 is shown in Fig. 10, where large periods are marked with vertical lines. If m is not an integer, then small periods are superimposed on large ones as in Fig. 11. ⎡
a T +C
(1−a 2 )(1−bs)(s−b)(a C −c)(1−ca C )
⎢
⎢ 1 2 ⎢ + (1−b2 )T
V (n) = q ⎢ ⎢ ⎣
+ (1−c12 )C
T −1
k=0 C−1 k=0
an
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ n
u C+1 k un (1−au k )(u k −a)(u Ck −c)(1−cu Ck ) k vkT +1 (1−avk )(vk −a) vkT −b
(
)(1−bvkT )
(13)
vk
Models of multi-period circular images are similar to single-period ones. Namely, instead of a cylindrical spiral, an Archimedes spiral is used. An example of imitation
Fig. 10 The CF graph with T = 3C
Fig. 11 The CF graph with T = 2.3C
204
V. R. Krasheninnikov et al.
of a cylindrical image for T = 3C is shown in Fig. 12. Three quasi-periods of brightness are noticeable on each horizon row (turn). Figure 13 shows examples of simulating circular images with m = 15. There are 15 small quasi-periods on each circle. The type of correlation in Fig. 10 is reflected in the correlation of the process. Figure 14 shows an example of simulating a process with two quasi-periods at T = 3C. Large periods are separated by long vertical lines and small periods are separated by short vertical lines. There is a double correlation of the process. The first one is correlation of adjacent small periods. At the same time, there is a high correlation of small periods at a distance of T = 3C, caused by the correlation of a large periods. A larger number of quasi-periods in processes and images can be obtained by adding factors like z T − b in Eq. 11. Thus, we obtain characteristic Eq. 14 and CF in Eq. 15: (z − a)
M
(z Tm − bm ) = 0
m=1
Fig. 12 The cylinder image with 3 small periods
Fig. 13 The examples of circular images with 15 small periods
(14)
Multi-quasi-periodic Cylindrical and Circular Images
205
Fig. 14 The graph of the process with T = 3C
⎡ ⎢ V (n) = q 2 ⎣
+
M m=1
1 (1−bm2 )Tm
aT an (1−a 2 )P(a) T T −Tm +1 m −1 z m,k zn (1−az m,k )(z m,k −a)Pm (z m,k ) m,k k=0
⎤ ⎥ ⎦,
(15)
where P(z) =
M
(z Tm − bm )(1 − bm z Tm ), T = T1 + · · · + TM ,
m=1
Pm (z) = P(z)/[(z Tm − bm )(1 − bm z Tm )], z m,k = bm1/Tm exp (i2π k/Tm ). Let us note that there is also another way to represent multi-period images and processes, namely, in the form of weighted sums of single-period processes. Let x1 (n), x2 (n), . . . , xm (n) be independent single-period processes given by the model Eq. 3 with the parameters Tk , ak , bk , qk and CF Vk (n), k = 1, 2, . . . , m. Then for any constants c1 , c2 , . . . , cm the process x(n) =
m
ck xk (n)
k=1
has the CF V (n) =
m
ck2 Vk (n).
k=1
5 Conclusions In the present paper, some autoregressive models of images and processes with several quasi-periods are proposed. Cylindrical image is represented as an autoregressive model along a cylindrical spiral. Circular images are determined by autoregression along the Archimedes spiral. The multi-periodicity in these models is achieved by complicating the autoregressive equation or weighted summation of single-period processes. The expressions of covariance functions of these images and processes
206
V. R. Krasheninnikov et al.
are given for any number of quasi-periods. The examples of images and processes with two quasi-periods are given for illustration. The variation of models parameters gives the possibility to represent wide class of quasi-periodic images and processes. In particular, these models can be applied to describe speech sounds, while multiperiodic models are more euphonious options. In addition, the use of these models makes it possible to obtain more accurate predictions of the state of technical objects (engines, turbines, etc.) subjected to various kinds of vibrations. Acknowledgements This study was funded by the RFBR, project number 20-01-00613.
References 1. Jähne, B.: Digital Image Processing, 6th edn. Springer, Berlin Heidelberg (2005) 2. Soifer, V.A. (ed.): Computer Image Processing. Part I: Basic Concepts and Theory. VDM Verlag Dr. Muller E.K. (2010) 3. Gonzalez, R.C., Woods, R.E.: Digital Image Processing, 4th edn. Pearson Prentice-Hall, New York (2017) 4. Vizilter, Y.V., Pyt’ev, Y.P., Chulichkov, A.I., Mestetskiy, L.M.: Morphological image analysis for computer vision applications. In: Favorskaya, M.N., Jain, L.C. (eds.) Computer Vision in Control Systems-1, ISRL, vol. 73, pp. 9–58. Springer International Publishing, Switzerland (2015) 5. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, New York (2000) 6. Habibi, A.: Two-dimensional Bayesian estimate of images. Proc. IEEE 60(7), 878–883 (1972) 7. Woods, J.W.: Two-dimensional Kalman filtering. In: Huang, T.S. (ed.) Two-Dimensional Digital Signal Processing I: Linear Filters. TAP, vol. 42, pp. 155–205. Springer, Berlin Heidelberg New York (1981) 8. Andriyanov, N.A., Dementiev, V.E., Vasil’ev, K.K.: Developing a filtering algorithm for doubly stochastic images based on models with multiple roots of characteristic equations. Pattern Recogn. Image Anal. 29(1), 10–20 (2019) 9. Favorskaya, M.N.: Motion estimation for objects analysis and detection in videos. In: Kontchnev, R., Nakamatsu, K. (eds.) Advances in Reasoning-Based Image Processing Intelligent Systems, ISRL, vol. 29, pp. 211–253. Springer, Berlin Heidelberg (2012) 10. Krasheninnikov, V.R., Armer, A.I., Kuznetsov, V.V., Lebedeva, E.Y.: Cross-correlation portraits of voice signals in the problem of recognizing voice commands according to patterns. Pattern Recogn. Image Anal. 21(2), 192–194 (2011) 11. Krasheninnikov, V.R., Vasil’ev, K.K.: Multidimensional image models and processing. In: Favorskaya, M.N., Jain, L.C. (eds.) Computer Vision in Control Systems-3, ISRL, vol. 135, pp. 11–64. Springer International Publishing, Switzerland AG (2018) 12. Dement’iev, V.E., Krasheninnikov, V.R., Vasil’ev, K.K.: Representation and processing of spatially heterogeneous images and image sequences. In: Favorskaya, M.N., Jain, L.C. (eds) Computer Vision in Control Systems-5, ISRL, vol. 175, pp. 53–97. Springer International Publishing, Switzerland AG (2020) 13. Krasheninnikov, V.R., Malenova, O.E., Subbotin, A.U.: The identification of doubly stochastic circular image model. In: 2020 24th International Conference KES-2020 on Knowledge-Based and Intelligent Information & Engineering Systems, pp. 1839–1847 (2020) 14. Krasheninnikov, V.R., Kalinov, D.V., Pankratov, Yu.G.: Spiral autoregressive model of a quasiperiodic signal. Pattern Recogn. Image Anal. 8(1), 211–213 (2001)
Multi-quasi-periodic Cylindrical and Circular Images
207
15. Krasheninnikov, V.R., Kuvayskova, Yu.E.: Modelling and forecasting of quasi-periodic processes in technical objects based on cylindrical image models. CEUR Workshop Proc. 2416, 387–393 (2019) 16. Anderson, T.W.: The Statistical Analysis of Time Series. Wiley, New York (2019) 17. Brown, J.W., Churchill, R.V.: Complex Variables and Applications. McGraw-Hill, New York (2009)
On the Importance of Capturing a Sufficient Diversity of Perspective for the Classification of Micro-PCBs Adam Byerly, Tatiana Kalganova, and Anthony J. Grichnik
Abstract We present a dataset consisting of high-resolution images of 13 microPCBs captured in various rotations and perspectives relative to the camera, with each sample labeled for PCB type, rotation category, and perspective categories. We then present the design and results of experimentation on combinations of rotations and perspectives used during training and the resulting impact on test accuracy. We then show when and how well data augmentation techniques are capable of simulating rotations versus perspectives not present in the training data. We perform all experiments using CNNs with and without homogeneous vector capsules (HVCs) and investigate and show the capsules’ ability to better encode the equivariance of the sub-components of the micro-PCBs. The results of our experiments lead us to conclude that training a neural network equipped with HVCs, capable of modeling equivariance among sub-components, coupled with training on a diversity of perspectives, achieves the greatest classification accuracy on micro-PCB data. Keywords Printed circuit boards (PCBs) · Convolutional neural network (CNN) · Homogeneous vector capsules (HVCs) · Capsule · Data augmentation
1 Introduction Early methods for attempting to classify objects in varying 2D projections of the 3D world used classic computer vision methods for identifying edges and contours. For example, in [1], the authors measured the similarity of objects after identifying A. Byerly · T. Kalganova Department of Electronic and Electrical Engineering, Brunel University London, Uxbridge UB8 3PH, UK A. Byerly (B) Department of Computer Science and Information Systems, Bradley University, Peoria, IL 1615, USA e-mail: [email protected] A. J. Grichnik Blue Roof Labs, 759 N. Main St., Eureka, IL 61530–1035, USA © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_17
209
210
A. Byerly et al.
correspondences between images and then using those correspondences to construct an estimated aligning transform. They were able to achieve an impressive 2.4% error rate on the COIL-20 dataset [2] when trained from an average of just 4 of the 2D projections of each 3D object. Several years later, researchers generated a larger more difficult dataset, the NORB dataset (NYU Object Recognition Benchmark) [3]. This dataset included higher resolution images than COIL-20 and contained not just variations in the objects’ rotations, but also their perspectives relative to the camera. In [4], the authors applied capsules (vector-valued neurons) to the smallNORB dataset (which is a subset of NORB) and achieved a new state-of-the-art test error rate. Also, worthy of note is the multi-PIE dataset [5], which is composed, not of objects, but of faces, from 337 different subjects from 15 different viewpoints and having 6 different expressions, for the purposes of facial recognition. Recognizing a specific face among a group of faces, as opposed to recognizing a four-legged animal among cars and airplanes, is a significant challenge because, assuming a lack of deformity, all faces are made up of the same sub-parts in similar locations (i.e., two eyes above a nose, which is above a mouth). Printed circuit boards (PCBs) are like the human face in that, for a given PCB, the individual elements (such as capacitors, resistors, and integrated circuits (ICs)) are not present invariantly relative to one another, but rather at very specific locations relative to one another. Compared to the human face, even on small PCBs, there are a greater number of features with greater similarity to one another (some modern small surface-mounted capacitors and resistors are nearly indistinguishable from one another, whereas eyes, noses, and mouths are quite distinctive from one another). Our contribution is as follows: 1. We present a dataset consisting of high-resolution images of 13 micro-PCBs captured in various rotations and perspectives relative to the camera, with each sample labeled for PCB type, rotation category, and perspective category. This dataset is unique relative to the other available PCB datasets (such as [6–9]) in that the micro-PCBs in our dataset are (1) readily available and inexpensive models, (2) captured many more times each, and (3) captured from a variety of labeled rotations and perspectives. 2. We conduct experiments on the dataset using CNNs with and without capsules and investigate and show the capsules’ ability to better encode the equivariance of the sub-components of the micro-PCBs. 3. We conduct experiments on the dataset investigating the ability of data augmentation techniques to improve classification accuracy on novel rotations and perspectives of the micro-PCBs when using CNNs with and without capsules. 4. The results of our experiments show that, as rotations can be more effectively and accurately augmented, image acquisition efforts should prioritize the capturing of a diversity of perspectives representative of the perspectives for which accurate inference is important.
On the Importance of Capturing a Sufficient Diversity of Perspective . . .
211
2 Image Acquisition We captured a total of 8125 images of the 13 micro-PCBs in our dataset using a Sony SLT-A35, 16 Megapixel DSLR Camera. After cropping the excess area around the micro-PCBs in each image, the average size of all images is 1949 × 2126 (width × height). The micro-PCBs were captured in 25 different positions relative to the camera under ideal lighting conditions. In each position, each micro-PCB was captured in five different rotations. This creates 125 unique orientations of each micro-PCB relative to the camera. Each unique orientation was captured four times and coded for training, and then, another micro-PCB of the same make and model was captured once and coded for testing. Thus, no micro-PCB that is used in training is the same that is used in testing. Although the micro-PCBs coded for training are nearly identical to those coded for testing, very subtle differences exist in some cases. In total, each micro-PCB in the dataset has 500 training images and 125 test images, creating an overall train/test split of 6500/1625. The micro-PCBs being placed in 25 different positions in the capture surface results in the creation of 25 unique perspectives of each micro-PCB relative to the camera. We refer to the position directly under the camera as the neutral perspective, the eight positions directly adjacent to the neutral position as “near” perspectives and the outer 16 positions as “far” perspectives. To fully distinguish the 25 perspectives, when looking down from the camera’s position to the capture surface, we refer to those that are to the left or above the camera as “negative” and those that are to the right or below the camera as “positive.” In each perspective, each micro-PCB was rotated across five rotations manually without attempting to place them in any exact angle. Instead, we placed each micro-PCB (1) straight, which we refer to as the neutral rotation, (2–3) rotated slightly to the left and right, which we refer to as the left shallow and right shallow rotations, respectively, and (4–5) rotated further to the left and right, which we refer to as the left wide and right wide rotations, respectively. After image acquisition, we used an edge detection algorithm to detect the left and right edges of each image. Using the left edge, we computed the angle of each micro-PCB relative to an ideal neutral. We then measured the distance of the left edge to the right edge at the bottom and top of each image and created a ratio between the two distances. This ratio is representative of the true perspective along the yaxis. The presence of various connectors on the top edge of the micro-PCBs made algorithmically determining an accurate top edge of the micro-PCBs impossible, so we do not present ratios to be representative of the true perspective along the x-axis. However, a reasonable estimate can be calculated using the corresponding ratio for the y-axis multiplied by the ratio of an image’s width to its height.
212
A. Byerly et al.
3 Experimental Design and Results In [10], the authors experimented with a simple monolithic CNN. In their experiments, they compared a baseline model that used the common method of flattening the final convolution operation and classifying through a layer of fully connected scalar neurons with a variety of configurations of homogeneous vector capsules (HVCs). The authors proposed and demonstrated that classifying through HVCs is superior to classifying through a layer of fully connected scalar neurons on three different datasets with differing difficulty profiles. In this paper, we extend that work to include this micro-PCB dataset. In all experiments performed, we compare classifying through a fully connected layer of scalar neurons to the best performing HVC configuration for the simple monolithic CNN in [10]. In our experiments, we label the fully connected network M1 and the network using HVCs M2. In addition to investigating the impact of HVCs on this dataset, we investigated (1) the ability of the networks to accurately predict novel rotations and perspectives of the micro-PCBs by excluding training samples with similar rotations and perspectives and (2) the ability of data augmentation techniques to mimic the excluded training samples. Table 1 shows rotations and perspectives that were used during training for experiments E1–E9. Testing always included all images from all rotations and perspectives. For these experiments, data augmentation techniques were not used to simulate the rotations and perspectives that were excluded during training. The results of those experiments are given in Table 2. Table 3 shows, for experiments A1–A16, both which rotations and perspectives were used during training, as well as whether data augmentation techniques were used to simulate the excluded rotations, excluded perspectives, or both. Again, testing always included all images from all rotations and perspectives. The results of those experiments is shown in Table 4.
Table 1 Experimental design for experiments excluding rotations and perspectives Experiment Train rotations Train perspectives Left Left Right Right Neg. far Neg. Pos. wide shallow shallow wide near near E1 E2 E3 E4 E5 E6 E7 E8 E9
Pos. far
On the Importance of Capturing a Sufficient Diversity of Perspective . . .
213
Table 2 Results of experiments excluding rotations and perspectives. In all cases, model M2 achieved a higher mean accuracy. Five trials of each of experiments E1–E5, E7, and E9 were conducted. Ten trials of E6 were conducted in order to establish statistical significance. After ten trials of experiment E8, the higher mean of accuracy of model M2 was not shown to be statistically significant Experiment M1 (Fully connected) M2 (Capsules) p-value Mean (%) Max (%) S.D. Mean (%) Max (%) S.D. E1 E2
93.28 92.30
96.90 92.30
0.02025 0
99.45 98.84
100.00 99.14
0.01117 0.00244
E3
9.78
11.76
0.01415
39.45
42.55
0.02703
E4
92.30
92.30
98.92
99.45
0.00533
E5
25.14
35.10
2.58 × 10−8 0.06826
89.74
91.63
0.02087
E6 E7
18.10 22.89
71.67 41.01
0.19831 0.10565
34.29 78.62
42.73 81.77
0.05567 0.02368
E8 E9
44.98 10.10
100.00 13.05
0.44777 0.01831
46.48 27.61
50.37 36.45
0.03487 0.05395
0.00033 6.68 × 10−12 2.11 × 10−8 3.05 × 10−9 3.71 × 10−8 0.02304 2.95 × 10−6 0.91692 0.00013
The bold indicates which model achieved superior accuracy for the experiment. Note that each line has results for two models (M1 and M2) and in each case only one (the superior one) is bolded
Table 6 shows the results of a final set of experiments, wherein we included all training samples, and for each training sample, we applied a range of rotation and perspective warp augmentations based on the PCBs’ coded labels for rotation and perspectives consistent with the distribution for each specific rotation and perspective. For all experiments, including those in which data augmentation techniques were not used to simulate the rotations and perspectives that were excluded, a small amount of random translation was applied during training in order to encourage translational invariance. This translation was limited to no more than 5% in either or both of the x and y directions.
4 Discussion 4.1 Experiments E1–E9 These experiments showed that model M2 (using HVCs) is superior to model M1 (using a fully connected layer) for all 9 experiments, though this was statistically significant for only 8 of the 9 experiments. Both models M1 and M2 were better able to cope with excluded rotations than excluded perspectives. This is not especially surprising given that rotation is an affine transformation, whereas perspective changes are not. For both M1 and M2, accuracy was especially poor when excluding all non-
214
A. Byerly et al.
Table 3 Experimental design for experiments using data augmentation to simulate the excluded rotations and perspectives Experiment Augment the excluded Train rotations Train perspectives Rotations Perspectives Left Left Right Right Neg. Neg. Pos. Pos. wide shal- shal- wide far near near far low low A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16
neutral perspectives (experiments E3, E6, and E9) to the point that the accuracy for model M1 for experiments E3 (including all rotation variants) and E9 (including no rotation variants) was equivalent to that of random guessing. A surprising result is that model M1 for experiment E6 (which included only the near rotation variants) had a mean accuracy that was approximately twice as accurate as either E3 or E9 (for model M1).
4.2 Experiments A1–A16 These experiments showed that model M2 (using HVCs) is superior to model M1 (using a fully connected layer) for 11 out of 16 experiments, though this was statistically significant for only 8 of the 16 experiments. Model M1 is superior twice with statistical significance. The lack of statistical significance in 8 out of the 16 experiments is due to the high variance across trials for the the experiments using model M1.
On the Importance of Capturing a Sufficient Diversity of Perspective . . .
215
Table 4 Results of experiments using data augmentation to simulate the excluded rotations and perspectives. In 11 out of 16 experiments, model M2 achieved a higher mean accuracy. Five trials of all experiments were conducted. In all but four experiments, there was greater variance across trials with model M1 Experiment M1 (Fully connected) M2 (Capsules) p-value Mean (%) Max (%) S.D. Mean (%) Max (%) S.D. A1
92.30
92.30
0
99.27
99.69
0.00357
A2 A3
35.73 90.76
100.00 92.30
48.90 99.40
58.56 99.82
0.06991 0.00284
A4
22.73
26.17
0.39620 3.44 × 10−2 0.03215
96.37
97.48
0.00785
A5
23.82
31.40
0.05243
94.52
95.75
0.02143
A6
20.87
29.62
0.05037
92.51
93.97
0.01614
A7 A8 A9
83.44 58.13 10.00
100.00 100.00 11.45
0.15976 0.37197 0.00899
36.44 40.87 45.04
44.58 44.95 53.51
0.06076 0.03404 0.05463
A10
24.14
29.43
0.04385
93.85
94.70
0.01037
A11
98.03
100.00
0.04406
49.62
62.87
0.10311
A12 A13 A14 A15 A16
72.77 23.72 21.01 18.88 40.46
100.00 51.54 61.82 53.63 100.00
0.39731 0.20104 0.22826 0.19435 0.42248
58.00 49.68 35.07 34.29 36.11
69.77 57.27 39.96 41.56 47.48
0.08962 0.08771 0.05824 0.05613 0.06581
8.44 × 10−11 0.48483 0.00052 2.95 × 10−11 2.93 × 10−9 1.53 × 10−9 0.00027 0.33187 6.05 × 10−7 5.33 × 10−10 1.10 × 10−5 0.44100 0.02941 0.21863 0.12698 0.82585
The bold indicates which model achieved superior accuracy for the experiment. Note that each line has results for two models (M1 and M2) and in each case only one (the superior one) is bolded
4.3 Comparing Experiments E1–E9 with Experiments A1–A16 Table 6 shows a comparison of mean accuracies achieved by the trials of experiments E1–E9 with those of experiments A1–A16, grouping the experiments together by the rotations and perspectives that were excluded during training. Not surprisingly, in all cases, the experiments that included data augmentation to replace some or all of the excluded samples achieved higher mean accuracy. In 9 out of the 16 comparisons, the superiority was statistically significant. Of those that did not produce a statistically significant difference, (1) experiments E2 and A1 each excluded only the far perspectives, (2) experiments E4 and A3 each excluded only the wide rotations, (3)
216
A. Byerly et al.
Table 5 Comparing results with and without augmentation of excluded rotations and perspectives. Horizontal lines in this table are used to group together experiments E1–E9 with their counterpart experiments A1–A16 based on the samples that were used during training. For example, E5 and A4-A6 were all trained on the subset of training samples that excluded the wide rotations and the far perspectives Experiment M1 (%) M2 (%) Experiment M1 (%) M2 (%) p-value E2 E3 E4 E5
92.30 9.78 92.30 25.14
98.84 39.45 98.92 89.74
34.29
A1 A2 A3 A4 A5 A6 A7
92.30 35.73 90.76 22.73 23.82 20.87 83.44
99.27 48.90 99.40 96.37 94.52 92.51 36.44
E6
18.10
A8 A9 A10
58.13 10.00 24.14
40.87 45.04 93.85
A11 A12 A13 A14 A15 A16
98.03 72.77 23.72 21.01 18.88 40.46
49.62 58.00 49.68 35.07 34.29 36.11
E7
22.89
78.62
E8
44.98
46.48
E9
10.10
27.61
0.056562 0.022433 0.112934 0.00016 0.00727 0.04678 2.46 × 10−5 0.01604 0.00358 1.05 × 10−6 0.02226 0.26219 0.32177 0.31777 0.09152 0.05605
The bold indicates which model achieved superior accuracy for the experiment. Note that each line has results for two models (M1 and M2) and in each case only one (the superior one) is bolded
experiments E8, A12, and A13 excluded all rotations and the far perspectives, and (4) experiments E9, A14, A15, and A16 excluded all rotations and all perspectives. In 6 out of 7 these comparisons, the data augmentation was attempting to synthesize excluded perspectives. As perspective warp is a non-affine transformation, it makes sense that synthesizing excluded perspectives with it meets with limited (but not no) success.
4.4 Regarding Synthesizing Alternate Perspectives As our experiments demonstrate, using data augmentation to supply excluded and/or greater variations of rotations works better than data augmentation to supply excluded and/or greater variations of perspective. Indeed, rotation is generally considered a staple data augmentation technique irrespective of the subject matter. As mentioned earlier, this is because rotation is an affine transformation and as such, rotating the image produces the same result as rotating the capturing apparatus would have. Per-
On the Importance of Capturing a Sufficient Diversity of Perspective . . .
217
Table 6 Results of experiments with no exclusions and using data augmentation. We conducted ten trials for each model. M2, using homogeneous vector capsules, is shown to be superior with a p-value of 5.52 × 10−5 . Model Mean (%) Max (%) S.D. M1 (Fully connected) M2 (Capsules)
94.52 99.06
96.90 100.00
0.02009 0.01862
spective warp of captured 3D subject matter is not affine. Moving the capturing apparatus to a different perspective relative to 3D subject matter will produce larger or smaller patches of components that extend into the third dimension. While our simulation of perspective differences did improve upon accuracy when not using such, our results show that, when possible, capturing a variety of perspectives during training is the best avenue for generating higher accuracy during subsequent evaluations that could include analogous perspective varieties.
4.5 Experiments Including All Rotations and Perspectives During Training and Using Data Augmentation Augmentation to Synthesize Variations of Those Rotations and Perspectives For our final set of experiments, we both trained with the full training set and applied rotational and perspective warp data augmentation throughout training. For each training sample, we rotated it by a random rotation drawn from a normal distribution derived from the training images for that rotational label. Perspective warp transformations were applied using the same procedure. As is shown in Table 6, model M1, using a fully connected layer, achieved a mean accuracy of 94.52%, surpassing 15 out of 16 of the experiments detailed in Table 4 (for model M1). Experiment A11 performed slightly better, but it should be noted that that experiment also used data augmentation for both rotation and perspective warp, with the difference being that experiment A11 excluded all of the rotated training samples, and the far perspectives. Model M2, using HVCs, achieved a mean accuracy of 99.06%, surpassing 14 out of 16 of the experiments detailed in Table 4 (for model M2). Experiment A1 and A3 performed slightly better, but once again, it should be noted that those experiments also used data augmentation that covered the rotations and perspectives of the training samples that were excluded. Model M2’s maximum accuracy was 100% surpassing the maximum accuracy of all experiments (for model M2) A1–A11.
218
A. Byerly et al.
5 Conclusion In this paper, we have presented a dataset consisting of high-resolution images of 13 micro-PCBs captured in various rotations and perspectives relative to the camera, with each sample labeled for PCB type, rotation category, and perspective category. We have shown, experimentally, that (1) classification of these micro-PCBs from novel rotations and perspectives is possible, but, in terms of perspectives, better accuracy is achieved when networks have been trained on representative examples of the perspectives that will be evaluated. (2) That, even though perspective warp is non-affine, using it as a data augmentation technique in the absence of training samples from actually different perspectives is still effective and improves accuracy. (3) And that using homogeneous vector capsules (HVCs) is superior to using fully connected layers in convolutional neural networks, especially when the subject matter has many sub-components that vary equivariantly (as is the case with micro-PCBs), and when using the full training dataset and applying rotational and perspective warp data augmentation the mean accuracy of the network using HVCs is 4.8% more accurate then when using a fully connected layer for classification. The dataset used for all experiments is publicly available on Kaggle at: https://www.kaggle.com/frettapper/micropcb-images The code used for all experiments is publicly available on GitHub at: https://github.com/AdamByerly/micro-pcb-analysis
References 1. Belongie, S., Malik, J., Puzicha, J.: Matching shapes. In: Proceedings of the Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vancouver, BC, Canada, 2001, vol. 1 pp. 454–461 (2001). https://doi.org/10.1109/ICCV.2001.937552 2. Murase, H., Nayar, S.K.: Visual learning and recognition of 3-d objects from appearance. Int J Comput Vis 14, 5–24 (1995). https://doi.org/10.1007/BF01421486 3. LeCun, Y., Huang, F. J., Bottou, L.: Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2004. Washington, DC, USA, vol. 2, pp. 97–104 (2004). https://doi.org/10.1109/CVPR.2004.1315150 4. Hinton, G.E., Sabour, S., Frosst, N.: Matrix capsules with EM routing. In: The Sixth International Conference on Learning Representations. ICLR 2018 (2018) 5. Gross, R., Matthews, I., Cohn, J., Kanade, T., Baker, S.: Multi-PIE. In: 8th IEEE International Conference on Automatic Face & Gesture Recognition, Amsterdam, pp. 1–8 (2008). https:// doi.org/10.1109/AFGR.2008.4813399 6. Pramerdorfer, C., Kampel, M.: PCB recognition using local features for recycling purposes. In: Proceedings of the 10th International Conference on Computer Vision Theory and Applications. VISIGRAPP 2015, vol. 1, pp. 71–78 (2015). https://doi.org/10.5220/0005289200710078 7. Lu, H., Mehta, D., Paradis, O., Asadizanjani, N., Tehranipoor, M., Woodard, D. L.: FICSPCB: a multi-modal image dataset for automated printed circuit board visual inspection. In: Cryptology ePrintArchive, Report 2020/366 (2020). https://eprint.iacr.org/2020/366 8. Pramerdorfer, C., Kampel, M.: A dataset for computer-vision-based PCB analysis. In: Proceedings of the 14th IAPR International Conference on Machine Vision Applications. MVA 2015, pp. 378–381 (2015). https://doi.org/10.1109/MVA.2015.7153209
On the Importance of Capturing a Sufficient Diversity of Perspective . . .
219
9. Mahalingam, G., Gay, K.M., Ricanek, K.: PCB-METAL: a PCB image dataset for advanced computer vision machine learning component analysis. In: Proceedings of the 16th International Conference on Machine Vision Applications. MVA 2019 (2019). https://doi.org/10.23919/ MVA.2019.8757928 10. Byerly, A., Kalganova, T.: Homogeneous vector capsules enable adaptive gradient descent in convolutional neural networks. arXiv:1906.08676 [cs.CV] (2019)
Robust Visual Vocabulary Based On Grid Clustering Achref Ouni, Eric Royer, Marc Chevaldonné, and Michel Dhome
Abstract Content-based image retrieval (CBIR) is the task of finding the images in the dataset that are considered similar to an input query based on their visual content. Many methods based on visual description try to solve the CBIR problem. In particular, bag of visual words (BoVW) is one of the most algorithm used to image classification and recognition. But, even with the discriminative power of BoVW, this problem is still a challenge in computer vision. We propose in this paper an efficient CBIR approach based on bag of visual words model (BoVW). Our aim here is to improve the image representation by transforming the BoVW model to the bag of visual phrase (BoVP) based on grid clustering approach. We show experimentally that the proposed model leads to the increase of accuracy of CBIR results. We study the performance of the proposed approach on four different datasets (Corel 1 K, UKB, Holidays, MSRC v1) and two descriptors (SURF, KAZE). Keywords Image retrieval · BoVW · Descriptors · Classification
1 Introduction CBIR is an important task in computer vision and a key step for many applications such as pose estimation, virtual reality, medical diagnosis, remote sensing, crime detection, video analysis and military surveillance. CBIR is the task of retrieving the images similar to the input query from the dataset based from their contents. A CBIR system (see Fig. 1) is often based on three main steps: (1) features extraction, (2) signature construction and (3) retrieved images. The first step starts by detecting then extracting the descriptors from the interest points. This step can be applied using local descriptors like SIFT, SURF, etc. Then, encoding the primitive image such as color, texture and shape in a vector of numeric values denoted “image signature”. State of the art mentions two main contributions used for building the image signature: BoVW [1] and convolutional neural network descriptors (CNN) [2]. For retrieval, A. Ouni (B) · E. Royer · M. Chevaldonné · M. Dhome Université Clermont Auvergne, CNRS, SIGMA Clermont, Institut Pascal, 63000 Clermont-Ferrand, France e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_18
221
222
A. Ouni et al.
Fig. 1 Flowchart of CBIR system
images must be represented as numeric values. Both contributions represent images as vector of valued features. Finally, computing the distance between the query and the dataset using a specific metric (Euclidean, Hamming, Jaccard, etc.). As result, we obtain for each couple of image a score that indicates if the query image and the candidate are similar or not. The couple of images with low score are considered as similar candidates. Many works have been proposed [3] [4] [5] inspired from BoVW with improvements that make the description of an image more relevant. We focus in this work to improve the bag of visual words model. Inspired by previous research, we propose in this paper a robust image signature that incorporates both bag of visual words model and grid clustering approach in an effective and efficient way. The output of this combination produces an improved version of bag of visual words model named “Bag of visual phrase based grid clustering”. We test this approach on three different datasets and two different visual descriptors (SURF, KAZE). We show experimentally that the proposed approach achieve a better results in terms of accuracy compared to the state-of-the-art methods. This article is structured as follows: We provide a brief overview of BoVW-related works in Sect. 2. We explain our proposals in Sect. 3. We present the experimental part on four different datasets and discuss the results of our work in Sect. 4. Section 5 conclusion.
2 Related Work We first discuss the state of the art in both of our contribution fields: BoVW model and grid clustering approach. Many CBIR systems have been proposed during the last years. In the following sections, we will explain the main contributions for retrieving the images by similarity.
2.1 Bag of Visual Words Model BoVW model proposed by [1] is one of the most models used to classify the images by content. This approach is composed of three main steps (see Fig. 2): (i) detection and feature extraction, (ii) codebook construction and (iii) vector quantization. Detection
Robust Visual Vocabulary Based On Grid Clustering
223
Fig. 2 Bag of visual words model [1]
and extraction features in an image can be performed using extractor algorithms. Many descriptors have been proposed to encode the images into a vector. Scaleinvariant feature transform (SIFT) [6] and speed-up robust feature (SURF) [7] are the most used descriptors in CBIR. Interesting work from Arandjelovic and Zisserman [8] introduced an improvement by upgrading SIFT to RootSift. In other side, binary descriptors have proven useful [9] and ORB (oriented FAST and rotated BRIEF) is proposed to speed up the search. An other work [10] combined two aspects: precision and speed thanks to binary robust invariant scalable keypoints (BRISK) descriptor. Iakovidou et al [11] presented a discriminative descriptor for image similarity based on combining contour and color information. Due the limit of BoVW model, many improvements have been proposed for more precision. BoVP [5] is a high-level description using a more than word for representing an image. It formed phrases using a sequence of n-consecutive words regrouped by L2 metric. Ren et al [12] build an initial graph and then split the graph into a xed number of sub-graphs using the NCut algorithm. Each histogram of visual words in a sub-graph forms a visual phrases. Chen et al [13] grouped the visual words in pairs using the neighborhood of each point of interest. The pairs words are chosen as visual phrases. In [14], the authors presented a bag of visual phrase model based on CAH algorithm. In [15], the authors present descriptors both color and edge information. Jubouri et al [16] presented two signatures to integrate both color and texture visual information to represent images. Perronnin and Dance [3] applies Fisher kernels to visual words represented by means of a Gaussian mixture model (GMM). Similar approach introduced a simplification for Fisher kernel. Similar to BoVW, vector of locally aggregated descriptors (VLAD) [4] assigned to each feature or keypoint its nearest visual word and accumulate this difference for each visual word. Mehmood et al [17] proposed a framework between local and global histograms of visual words.
2.2 Grids Clustering Grid clustering process is by cutting the data representation space into a set of cells (e.g., hyper-cubes, hyper-rectangles). As a result, these methods focus on the processing of spatial data. The result of such a method is a partition of the data via a partition of the data representation space. The clusters formed correspond to a set of
224
A. Ouni et al.
Fig. 3 2D hierarchical structure generated by STING [18]
dense and connected cells. The main difficulty of these methods concerns the search for an appropriate size for the cells built (granularity problem). Too small cells would lead to a “on-partitioning” constraint of too large cells which lead to result in a “under partitioning”. A hierarchical grid structure allowing fast analysis of the data was proposed in [18]. Each node of the hierarchy, other than the root or a leaf, has a xed number of child nodes for which each corresponding cell is a quadrant of the parent cell. In Fig. 3, we present a hierarchical structure for STING clustering, in the twodimensional case. Sheikholeslami et al [19] used a transformation method for signal processing on synthesized data by splitting the grid attribute space. Hopkins et al [20] merged dense and connected cells. An inconvenience common with the previous three methods is their low performances in the case of data described in a space with big dimension. Indeed, the quantity of cells increases exponentially with the dimension of the space of the data.
3 Bag-of-Visual-Phrase-Based Grid Clustering: GBoVP Inspired from BoVW model, we propose a BoVP based on grid clustering approach. In the literature, a visual phrase is a set of visual words linked together by specific criteria. This linkage aims to improve the image representation. So, our key idea is to present an image as a set of clusters which each cluster contains a set of visual words. In this case, each cluster present a set of visual phrases if the size >2. Next, from the obtained clusters, we combined the visual words inside each cluster to build a discriminative signature denoted “bag of visual phrase”. In Fig. 4, we present our global framework and the their different steps. In the first part, we nd the detection and extraction for an input query. After, using the visual words or vocabulary constructed in o ine, we assigned for each keypoint detected
Robust Visual Vocabulary Based On Grid Clustering
225
Fig. 4 Global framework
in the image the corresponding visual vocabulary using L2 metric. Here, we apply the discriminative power of grids clustering approach to link the visual words in the image description together. The grid-based technique is fast and has low computational complexity. These advantages perform the signature image construction in term of time. So, based on general concept of grids clustering approach, we clustering the set of keypoints detected on the image with the aim of obtaining a set of clusters. Inside each cluster a set of keypoints. In Fig. 5, after obtaining all constructed clusters, we assign to each keypoint inside the clusters a visual words. Then, we linked the visual words
Fig. 5 Clustering features (each color present a cluster) for the aim to create the visual phrase. Each cluster presents a visual phrase or more
226
A. Ouni et al.
Fig. 6 Ten categories of Corel-1 K dataset
together for creating the visual phrases. The image signature is with size M M where M is the number of visual words. We initialize the matrix to zero. Next, we ll the matrix with the indices of the visual phrases. For example, if the pair visual words are V W5 and V W30 then we increment the element at index (5,30). Finally, the similarity is computed by the Euclidean distance between the matrix query and the matrix from dataset.
4 Results In this section, we present the potential of our approach on large datasets. Our goal is to increase the CBIR accuracy and reduce the execution time. To evaluate our proposition, we test on the following datasets: Corel 1K or Wang [21] is a dataset of 1000 images (Fig. 6) divided into 10 categories, and each category contains 100 images. The evaluation is computed by the average precision of the first 100 nearest neighbors among 1000. University of Kentucky Benchmark which has been proposed by Nister et al. is referred as UKB to simplify the reading. UKB contains of 10200 images (Fig. 7) divided into 2550 groups; each group consists of four images of the same object with different conditions (rotated, blurred, etc). The score is the mean precision over all images for four nearest neighbors. INRIA Holidays, referred as Holidays, is a collection of 1491 images (Fig. 8), 500 of them are query images, and the remaining 991 images are the corresponding relevant images. The evaluation on holidays is based on mean average precision score (mAP). MSRC v11 (Microsoft Research in Cambridge) has been proposed by Microsoft Research team. MSRC v1 contains 241 images divided into nine categories. The evaluation on MSRC v1 is based on mean average precision (MAP) score. Some images from the dataset are shown in Fig. 9. 1 https://pgram.com/dataset/msrc-v1/.
Robust Visual Vocabulary Based On Grid Clustering
227
Fig. 7 Example of images from UKB dataset
In this section, we present the experiments of our approach-based grid clustering approach. In order to test the efficiency of our proposed methods, we conducted the experimentation on four retrieval datasets (MSRC V1, UKB, Holidays, Wang) and two different descriptors (SURF, KAZE). In Tables 1 and 2, we present the MAP scores of our global framework. Two distinctive algorithms based on grid clustering have been applied on BoVW model to get the improved version : Sting [18] algorithm and wavelet algorithm [19]. The results obtained using Sting algorithm are better than the results obtained using wavelets. We perform the results by concatenating the GBOPsurf and GBOPkaze . Finally, we compare our approach against few state-of-the-art methods in Table 3. As indicated, the results of our proposed present good performance for all datasets.
5 Conclusion In this paper, we present an efficient bag of visual phrase model based on grid clustering approach. We show that the use of the grid clustering approach combined with BoVW model increases the CBIR accuracy. Using two different descriptors (KAZE, SURF), our approach achieves a better result in terms of accuracy compared to the state-of-the-art methods.
228
Fig. 8 Example of images from holidays dataset
A. Ouni et al.
Robust Visual Vocabulary Based On Grid Clustering
229
Fig. 9 Example of images from MSRC v1 dataset Table 1 MAP evaluation scores for bag-of-visual-phrase-based Sting [18] algorithm Datasets / Descriptors
GBoVPsurf
GBoVPkaze
GBoVPsurf .GBoVPkaze
MSRC v1
0.56
0.56
0.67
UKB
3.12
3.08
3.57
Holidays
0.58
0.55
0.69
Wang
0.51
0.53
0.62
Table 2 MAP evaluation scores for bag-of-visual-phrase-based wavelet [19] algorithm Datasets / Descriptors
GBoVPsurf
GBoVPkaze
GBoVPsurf .GBoVPkaze
MSRC v1
0.54
0.53
0.62
UKB
3.09
3.05
3.51
Holidays
0.53
0.52
0.64
Wang
0.45
0.50
0.55
Table 3 Comparison of the accuracy of our approach with methods from the state of the art Methods / Datasets
MSRC v1
Wang
Holidays
UKB
BoVW [1]
0.48
0.41
0.50
2.95
n-BoVW [14]
0.58
0.57
0.57
3.50
VLAD [4]
–
–
0.53
3.17
N-Gram [5]
–
0.34
–
–
Fisher [3]
–
–
0.69
3.07
SaCoCo[11]
–
0.51
0.76
3.33
CEDD [15]
–
0.54
0.72
3.24
DGLCM [16]
–
0.64
–
–
Gray GLCM [16]
–
0.48
–
–
Ours (best)
0.67
0.62
0.64
3.57
230
A. Ouni et al.
References 1. Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual cat-egorization with bags of keypoints. In Workshop on statistical learning in computer vision, ECCV, vol. 1(1–22), pp. 1–2 (May 2004) 2. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classi cation with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012) 3. Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (June 2007) 4. Jegou, H., Douze, M., Schmid, C., Perez, P.:Aggregating local descrip-tors into a compact image representation. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3304–3311. IEEE (June 2010) 5. Pedrosa, G.V., Traina, A.J.: From bag-of-visual-words to bag-of-visual-phrases using n-grams. In 2013 XXVI Conference on Graphics, Patterns and Images, pp. 304–311. IEEE (Aug 2013) 6. Lindeberg, T.:Scale invariant feature transform (2012) 7. Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust fea-tures. In European Conference on Computer Vision, pp. 404–417. Springer, Berlin, Heidelberg (May 2006) 8. Arandjelovic, R., Zisserman, A.: All about VLAD. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1578–1585 (2013) 9. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: An e cient alternative to SIFT or SURF. In 2011 International Conference on Computer Vision, pp. 2564–2571. IEEE (Nov 2011) 10. Leutenegger, S., Chli, M., Siegwart, R. Y.: BRISK: Binary robust invariant scalable keypoints. In 2011 International Conference on Computer Vision, pp. 2548–2555. IEEE (Nov 2011) 11. Iakovidou, C., Anagnostopoulos, N., Lux, M., Christodoulou, K., Boutalis, Y., Chatzichristo s, S.A.: Composite description based on salient contours and color information for CBIR tasks. IEEE Trans. Image Process. 28(6), 3115–3129 (2019) 12. Ren, Y., Bugeau, A., Benois-Pineau, J.: Visual object retrieval by graph features (2013) 13. Chen, T., Yap, K.H., Zhang, D.: Discriminative soft bag-of-visual phrase for mobile landmark recognition. IEEE Trans. Multimedia 16(3), 612–622 (2014) 14. Ouni, A., Urruty, T., Visani, M.: A robust CBIR framework in between bags of visual words and phrases models for speci c image datasets. Multimedia Tools Appl. 77(20), 26173–26189 (2018) 15. Chatzichristo s, S. A., Boutalis, Y. S.: CEDD: Color and edge di-rectivity descriptor: a compact descriptor for image indexing and retrieval. In In-ternational Conference on Computer Vision Systems, pp. 312–322. Springer, Berlin, Heidelberg (May 2008) 16. Al-Jubouri, H. A.: Integration colour and texture features for content-based image retrieval. Int. J. Mod. Educ. Comput. Sci. 12(2) (2020) 17. Mehmood, Z., Anwar, S.M., Ali, N., Habib, H.A., Rashid, M.: A novel image retrieval based on a combination of local and global histograms of visual words. Math. Prob. Eng. (2016) 18. Wang, W., Yang, J., Muntz, R.: STING: a statistical information grid approach to spatial data mining. In VLDB, vol. 97, pp. 186–195 (Aug 1997) 19. Sheikholeslami, S., Chatterjee, S., Zhang, A.: A multi-resolution clustering approach for very large spatial databases. In Proceedings of the 24th International Confere Shaw MJ Shaw; Chandrasekar Subramaniam a, Gek Woo Tan a, Michae International Conference on Formal Ontology in Information Systems, vol. 1, pp. 622–630 (2002) 20. Hopkins, P.F.: GIZMO : multi-method magneto-hydrodynamics+ gravity code. Astrophysics Source Code Library, ascl-1410 (2014) 21. Wang, J.Z., Li, J., Wiederhold, G.: SIMPLIcity: semantics-sensitive in-tegrated matching for picture libraries. IEEE Trans. Patt. Anal. Mach. Intell. 23(9), 947–963 (2001)
Study of Anisotropy of Seismic Response from Fractured Media Alena Favorskaya
and Vasily Golubev
Abstract Investigating the anisotropy of response from fractured media through computational experiments plays an important role in seismic exploration for oil and gas. This paper shows the possibility of studying the anisotropy of the seismic response from a fractured zone using a continuum model with slippage and delamination. We varied four parameters of this model and obtained conclusions about the anisotropy of the seismic response depending on the direction of the fractures in the fractured zone. For the calculation, we used the grid-characteristic method on structured regular computational grids. To assess the anisotropy of the seismic response, we have introduced and used 12 anisotropy parameters based on the norms L1 and L∞ . Keywords Fractured zones · Anisotropy · Elastic wave phenomena · Wave investigation · Numerical modeling · Computer simulation · Grid-characteristic method · Continuum model with slippage and delamination
1 Introduction Many scientists around the world are engaged in numerical modeling of seismic response from fractured media. Various averaged models [1, 2], explicit approach of fracture description [3, 4], and infinitely thin fracture model [5, 6] are used. Various types of computational grids are used: regular [7], triangular [8], and tetrahedral [9]. Finite difference methods [10, 11], including grid-characteristic method [3, 5, 9], discontinuous Galerkin method [12], and the method of spectral elements [13], are used to simulate the propagation of seismic waves in geological environments from fractured zone. In this work, we use the grid-characteristic method [3] and the averaged continuum model of slip and delamination [14-17]. This paper is organized as follows. Section 2 presents the used mathematical model. The formulae to calculate anisotropy parameters are presented in Sect. 3. A. Favorskaya (B) · V. Golubev Moscow Institute of Physics and Technology, 9 Institutsky lane, Moscow Region, Dolgoprudny 141700, Russian Federation © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_19
231
232
A. Favorskaya and V. Golubev
There are results of computational experiments, i.e., wave patterns and graphs of anisotropy parameters in Sect. 4. Section 5 concludes the paper.
2 Mathematical Model In order to take into account seismic responses from fractured zones, we solved boundary value problem of elastic wave equation using the grid-characteristic method [3]. In recent years, this method was successfully used to solve problems of seismic exploration [3, 5, 9], to simulate elastic waves in composite materials [18], porous materials [19, 20] to investigate exact solutions of boundary value problem of elastic wave equation [21]. The considered geological model is presented in Fig. 1. The dashed line marks the position of geological fractures that are inclined relative to the vertical on angle γ. We consider a center-symmetrical sensor system to compare the anisotropy of the blue and red seismic responses. In order to take into account fractured zone, we use continuum slip and delamn+1,E ≤0 ination model [14-17]. Let us consider case of normal n along OY. If σYY (compressed layers), then the following equations are true. n+1,E n+1,E n+1 n+1 = σYY σXX = σXX σYY
n+1 σXY
⎧ n+1,E n+1,E n+1,E n+1,E ⎪ q σYY 1 + δ σXY / 1 + δq σYY sign σXY ⎪ ⎪ ⎨ n+1,E n+1,E = if σXY ≥ q σYY ⎪ ⎪ ⎪ ⎩ σ n+1,E if σ n+1,E < q σ n+1,E XY XY YY
n+1,E If σYY > 0 (stretched layers), then the following equations are true.
Fig. 1 Scheme of geological model
(1)
(2)
Study of Anisotropy of Seismic Response from Fractured Media
n+1 σYY
n+1,E n n+1,E n − σYY λ σYY + σYY βσYY 1 n+1,E n+1 σXX = = σXX − 1+β 1+β λ + 2μ n+1 σXY =
n+1,E n + σXY ασXY 1+α
233
(3)
(4)
n+1 n+1 n+1 Here, σXX , σXY , and σYY are the components of final symmetric Cauchy stress n+1,E n+1,E n+1,E tensor, σXX , σXY , and σYY are the components of calculated using the gridn n n , σXY , and σYY are the components characteristic method Cauchy stress tensor, σXX of Cauchy stress tensor at previous time layer, δ is the viscous regularizer, q is the dry friction coefficient, α and β are the shear and weak stretch coefficients, and λ and μ are the Lame parameters. We used in the calculations the time step 0.4 ms, coordinate step 2 m, P-wave speed 2000 m/s, S-wave speed 1000 m/s, and density 2000 kg/m3 .
3 Anisotropy Parameters We have introduced the following 12 values in order to numerically quantify the anisotropy of the seismic responses. They are based on norms L1 and L∞ and are given by Eqs. 5−16. R,i, j L ,i, j R,i, j L ,i, j max v X − V X − v X − VX i∈[1,N T ], j∈[1,N R ] i∈[1,N T ], j∈[1,N R ] L ∞ {v X } = 2 · R,i, j L ,i, j R,i, j L ,i, j max max v X − V X + v X − VX max
i∈[1,N T ], j∈[1,N R ]
i∈[1,N T ], j∈[1,N R ]
(5) R,i, j L ,i, j R,i, j L ,i, j max max vY − VY − vY − VY i∈[1,N T ], j∈[1,N R ] i∈[1,N T ], j∈[1,N R ] L ∞ {vY } = 2 · R,i, j L ,i, j R,i, j L ,i, j max max vY − VY + vY − VY i∈[1,N T ], j∈[1,N R ]
i∈[1,N T ], j∈[1,N R ]
(6) L ∞ {|v|} = 2 ·
M R {|v|} − M L {|v|} M R {v} + M L {|v|}
(7)
NT NT NR NR R,i, j L ,i, j R,i, j L ,i, j v X − V X − v X − V X
L 1 {v X } = 2 ·
i=1 j=1
i=1 j=1
i=1 j=1
i=1 j=1
NT NT NR NR R,i, j L ,i, j R,i, j L ,i, j v X − VX + v X − VX
(8)
234
A. Favorskaya and V. Golubev
NT N R R,i, j R,i, j L ,i, j NT N R L ,i, j − VY − i=1 − VY i=1 j=1 vY j=1 vY L 1 {vY } = 2 · R,i, j L ,i, j NT N R R,i, j NT N R L ,i, j v + v − V − V i=1 j=1 Y i=1 j=1 Y Y Y S R {|v|} − SL {|v|} S R {|v|} + SL {|v|} R,i, j R,i, j L ,i, j max + vX v X − VX i∈[1,N T ], j∈[1,N R ] R,i, j R,i, j max max v X − V X +
L ,i, j − VX L ,i, j L ,i, j v X − VX
R,i, j R,i, j L ,i, j max − vY vY − VY i∈[1,N T ], j∈[1,N R ] R,i, j R,i, j max max vY − VY +
(11) L ,i, j − VY L ,i, j L ,i, j vY − VY
L 1 {|v|} = 2 ·
L ∞ {d X } = 2 ·
i∈[1,N T ], j∈[1,N R ]
L ∞ {dY } = 2 ·
(9)
(10)
i∈[1,N T ], j∈[1,N R ]
i∈[1,N T ], j∈[1,N R ]
i∈[1,N T ], j∈[1,N R ]
(12) M{|d|} M R {|v|} + M L {|v|} NT NR R,i, j R,i, j L,i, j L,i, j v + v − V − V X X X X i=1 j=1 L1 {dX } = 2 · R,i, j L,i, j NT NR R,i, j NT NR L,i, j − VX + i=1 − VX i=1 j=1 vX j=1 vX NT N R R,i, j R,i, j L ,i, j L ,i, j v − v − V − V Y Y Y Y i=1 j=1 L 1 {dY } = 2 · R,i, j L ,i, j NT N R R,i, j NT N R L ,i, j + − V − V v v i=1 j=1 Y i=1 j=1 Y Y Y L ∞ {|d|} = 2 ·
L1 {|v|} = 2 ·
S{|d|} SR {|v|} + SL {|v|}
(13)
(14)
(15)
(16)
In Eqs. 7 and 13, M R {|v|} and ML {|v|} are given by the following equations: MR {|v|} =
max
i∈[1,NT ], j∈[1,NR ]
M L {|v|} =
max
i∈[1,N T ], j∈[1,N R ]
2 R,i, j R,i, j + vY − VY ,
(17)
2 2 L ,i, j L ,i, j L ,i, j L ,i, j v X − VX + vY − VY .
(18)
R,i, j
vX
R,i, j
− VX
2
In Eqs. 10 and 16, SR {|v|} and SL {|v|} are given by the following equations: NT N R 2 2 R,i, j R,i, j R,i, j R,i, j S R {|v|} = v X − VX + vY − VY , i=1 j=1
(19)
Study of Anisotropy of Seismic Response from Fractured Media NT N R 2 2 L ,i, j L ,i, j L ,i, j L ,i, j v X − VX SL {|v|} = + vY − VY .
235
(20)
i=1 j=1
In Eq. 13, M{|d|} is given by the following equation: M{|d|} =
i∈[1,N T ], j∈[1,N R ]
+
R,i, j
vY
R,i, j
vX
max
R,i, j
− VY
R,i, j
− VX
2 L ,i, j L ,i, j + v X − VX
2 21 L ,i, j L ,i, j − vY − VY .
(21)
In Eq. 16, S{|d|} is given by the following equation: S{|d|} =
NT N R
R,i, j
vX
R,i, j
− VX
2 L ,i, j L ,i, j + v X − VX
i=1 j=1
2 21 R,i, j R,i, j L ,i, j L ,i, j − vY − VY + vY − VY . R,i, j
(22)
R,i, j
are the velocity components in right j-receiver at In Eqs. 5–22, v X and vY R,i, j R,i, j time moment t i , VX and VY are the background (without fractured inclusion) L ,i, j L ,i, j are the velocity components in right j-receiver at time moment t i , v X and vY L ,i, j L ,i, j velocity components in left j-receiver at time moment t i , VX and VY are the background (without fractured inclusion) velocity component in left j-receiver at time moment t i , N T is a number of time steps being equal to 2501, and N R is a number of receivers at left and right receivers systems being equal to 26.
4 Results of Computational Experiments Some wave patterns of velocity module for different parameters are pictured in Fig. 2. Graphs of the 12 anisotropy parameters in dependence on angle γ for all considered cases with different model characteristics shear and weak stretch coefficients α, β, viscous regularizer δ, and dry friction coefficient q are presented in Fig. 3. Angles of maximum anisotropy for different cases are summarized in Tables 1 and 2.
236
A. Favorskaya and V. Golubev
Fig. 2 Wave patterns at time moment 0.3 s, distance and depth in km: a α = 0.01, β = 0.01, δ = 0.01, q = 0.1, γ = 15º; b α = 0.01, β = 0.01, δ = 0.01, q = 0.4, γ = 20º; c α = 0.01, β = 10, δ = 10, q = 0.1, γ = 25º; d α = 10, β = 0.01, δ = 0.01, q = 0.1, γ = 15º; e α = 10, β = 0.01, δ = 0.01, q = 0.4, γ = 50º; f α = 10, β = 10, δ = 0.01, q = 0.1, γ = 70º
5 Conclusions The following conclusions might be done: • There are three types of dependence of anisotropy on the normal’s rotation angle γ, i.e., symmetrical pattern, one peak shifted toward small angles of rotation of the normal, and two peaks. • At low α and high β, the normal’s rotation angle γ corresponding to the maximum anisotropy does not change. • The normal’s rotation angle γ corresponding to the maximum anisotropy is mainly affected by the change in q. In general, it can be concluded that the continuum model of slip and delamination allows one to study the anisotropy of the seismic response from fractured zones.
Study of Anisotropy of Seismic Response from Fractured Media
237
Fig. 3 Seismic response anisotropy for different parameters in dependence on the normal’s rotation angle γ: a α = 0.01, β = 0.01, δ = 0.01, q = 0.1; b α = 0.01, β = 0.01, δ = 0.01, q = 0.4; c α = 0.01, β = 0.01, δ = 10, q = 0.1; d α = 0.01, β = 0.01, δ = 10, q = 0.4; e α = 0.01, β = 10, δ = 0.01, q = 0.1; f α = 0.01, β = 10, δ = 0.01, q = 0.4; g α = 0.01, β = 10, δ = 10, q = 0.1; h α = 0.01, β = 10, δ = 10, q = 0.4; i α = 10, β = 0.01, δ = 0.01, q = 0.1; j α = 10, β = 0.01, δ = 0.01, q = 0.4; k α = 10, β = 0.01, δ = 10, q = 0.1; l α = 10, β = 0.01, δ = 10, q = 0.4; m α = 10, β = 10, δ = 0.01, q = 0.1; n α = 10, β = 10, δ = 0.01, q = 0.4; o α = 10, β = 10, δ = 10, q = 0.1; p α = 10, β = 10, δ = 10, q = 0.4
238
Fig. 3 (continued)
A. Favorskaya and V. Golubev
Study of Anisotropy of Seismic Response from Fractured Media
239
Table 1 Angles with maximum anisotropy for different parameters of fractured media γ
15º
20º
15º
20º
25º
25º
25º
25º
α
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
β
0.01
0.01
0.01
0.01
10
10
10
10
δ
0.01
0.01
10
10
0.01
0.01
10
10
q
0.1
0.4
0.1
0.4
0.1
0.4
0.1
0.4
Table 2 Angles with maximum anisotropy for different parameters of fractured media γ
15º
α
10
10
10
10
10
10
10
10
β
0.01
0.01
0.01
0.01
10
10
10
10
δ
0.01
0.01
10
10
0.01
0.01
10
10
q
0.1
0.4
0.1
0.4
0.1
0.4
0.1
0.4
50º
15º
45º
70º
20º
65º
25º
Acknowledgements The reported study was funded by RFBR, project number 20-01-00261. This work has been carried out using computing resources of the federal collective usage center Complex for Simulation and Data Processing for Mega-science Facilities at NRC “Kurchatov Institute”, http:// ckp.nrcki.ru/.
References 1. Schoenberg, M., Douma, J.: Elastic wave propagation in media with parallel fractures and aligned cracks. Geophys. Prospect. 36(6), 571–590 (1988) 2. Shearer, P.M.: Introduction to seismology. Cambridge University Press (2019) 3. Favorskaya, A.V., Zhdanov, M.S., Khokhlov, N.I., Petrov, I.B.: Modelling the wave phenomena in acoustic and elastic media with sharp variations of physical properties using the gridcharacteristic method. Geophys. Prospect. 66(8), 1485–1502 (2018) 4. Lan, H.Q., Zhang, Z.J.: Seismic wavefield modeling in media with fluid-filled fractures and surface topography. Appl. Geophys. 9(3), 301–312 (2012) 5. Favorskaya, A., Petrov, I., Grinevskiy, A.: Numerical simulation of fracturing in geological medium. Proc. Comput. Sci. 112, 1216–1224 (2017) 6. Zhang, J.: Elastic wave modeling in fractured media with an explicit approach. Geophysics 70(5), T75–T85 (2005) 7. Guo, J., Shuai, D., Wei, J., Ding, P., Gurevich, B.: P-wave dispersion and attenuation due to scattering by aligned fluid saturated fractures with finite thickness: theory and experiment. Geophys. J. Int. 215(3), 2114–2133 (2018) 8. Cho, Y., Gibson, R.L., Vasilyeva, M., Efendiev, Y.: Generalized multiscale finite elements for simulation of elastic-wave propagation in fractured media. Geophysics 83(1), WA9–WA20 (2018) 9. Petrov, I.B., Favorskaya, A.V., Muratov, M.V., Biryukov, V.A., Sannikov, A.V.: Gridcharacteristic method on unstructured tetrahedral grids. Dokl. Math. 90(3), 781–783 (2014) 10. Novikov, M.A., Lisitsa, V.V., Kozyaev, A.A.: Numerical modeling of wave processes in fractured porous fluid-saturated media. Numer. Methods Program. 19, 130–149 (2018)
240
A. Favorskaya and V. Golubev
11. Wang, K., Peng, S., Lu, Y., Cui, X.: The velocity-stress finite-difference method with a rotated staggered grid applied to seismic wave propagation in a fractured medium. Geophysics 85(2), T89–T100 (2020) 12. Vamaraju, J., Sen, M.K., De. Basabe, J., Wheeler, M.: Enriched Galerkin finite element approximation for elastic wave propagation in fractured media. J. Comput. Phys. 372, 726–747 (2018) 13. Hou, X., Liu, N., Chen, K., Zhuang, M., Liu, Q.H.: The efficient hybrid mixed spectral element method with surface current boundary condition for modeling 2.5-D fractures and faults. IEEE Access 8, 135339–135346 (2020) 14. Burago, N.G., Zhuravlev, A.B., Nikitin, I.S.: Continuum model and method of calculating for dynamics of inelastic layered medium. Math. Models Comput. Simul. 11(3), 488–498 (2019) 15. Burago, N.G., Nikitin, I.S.: Improved model of a layered medium with slip on the contact boundaries. J. Appl. Math. Mech. 80(2), 164–172 (2016) 16. Nikitin, I.S., Burago, N.G., Golubev, V.I., Nikitin, A.D.: Methods for calculating the dynamics of layered and block media with nonlinear contact conditions. Smart Innovation Syst. Technol. 173, 171–183 (2020) 17. Nikitin, I.S., Burago, N.G., Golubev, V.I., Nikitin, A.D.: Continual models of layered and block media with slippage and delamination. Procedia Struct. Integrity 23, 125–130 (2019) 18. Petrov, I., Vasyukov, A., Beklemysheva, K., Ermakov, A., Favorskaya, A.: Numerical modeling of non-destructive testing of composites. Procedia Comput. Sci. 96, 930–938 (2016) 19. Golubev, V., Shevchenko, A., Petrov, I.: Simulation of seismic wave propagation in a multicomponent oil deposit model. Int. J. Appl. Mech. 2050084.1–2050084.18 (2020) 20. Petrov, I.B., Golubev, V.I., Shevchenko, A.V.: Problem of acoustic diagnostics of a damaged zone. Dokl. Math. 101(3), 250–253 (2020) 21. Favorskaya, A., Petrov, I.: A novel method for investigation of acoustic and elastic wave phenomena using numerical experiments. Theor. Appl. Mech. Lett. 10(5), 307–314 (2020)
Synchronization Correction Enforced by JPEG Compression in Image Watermarking Scheme for Handheld Mobile Devices Margarita N. Favorskaya
and Vladimir V. Buryachenko
Abstract Unauthorized Internet attacks against the watermarked images lead to the impossibility of a blind watermark extraction if the transmitted image is not normalized to its original geometric state. The aim of this study is to detect the regions in the watermarked image robust to rotation, scaling and translation (RST) attacks, enforced by additional immunity to JPEG lossy compression. Our algorithm is based on the simulation of JPEG lossy compression with following extraction of feature points using handcrafted and deep learning approaches. We incorporate JSNet as a JPEG simulator and Key.Net as an enforced way to find feature points invariant to RST attacks. As a result, we select the best anchor points, the coordinates of which are put into the secret key. Applying the same procedure to the transmitted image, we detect and recalculate the positions of the anchor points in the transmitted watermarked image, geometrically synchronizing with the original watermarked image and successfully extract the watermark. To address the mobile aspect, we exploited the simplest deep network architectures as far as the problem allowed with other clarifications. Keywords Image watermarking scheme · Deep learning · RST attacks · Compression · Handheld mobile device
1 Introduction As well-known, the watermarking schemes are divided into the spatial and transform domains with the preferable implementation of the last one due to its reliability and robustness. Nevertheless, both approaches are non-invariant to RST attacks against the watermarked images. This makes it impossible to extract the watermarks M. N. Favorskaya (B) · V. V. Buryachenko Reshetnev Siberian State University of Science and Technology, 31 Krasnoyarsky Rabochy ave, Krasnoyarsk 660037, Russian Federation e-mail: [email protected] V. V. Buryachenko e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_20
241
242
M. N. Favorskaya and V. V. Buryachenko
correctly. The existing methods robust to RST attacks are classified into four types: the exhaustive search, embedding a watermark in invariant domains, embedding synchronization templates and employing feature points’ detection [1]. The problem is complicated by the practically infinite values of geometric distortions caused by a single attack or multiple attacks. The main approach to solve the problem is to find small robust regions in the image, usually based on extraction of feature points, with their following involvement in the watermark embedding process. Some algorithms proposed to embed a watermark in such regions by adding invariance to rotation using moment transform [2], while other algorithms used these regions as a type of landmarks to find the affine parameters of illegal global distortions based on feature points’ matching [3]. In [4], an intermediate approach was proposed when the coordinates of feature points of the host image were embedded in the robust to RST attacks regions of the watermarked image. Our contribution is to develop a preprocessing algorithm that is not only invariant to global RST attacks, but also immune to JPEG lossy compression and can be implemented in handheld mobile devices. We consider a detection of regions invariant to the geometric attacks tightly bounded with simulating of noise distortions in the watermarked image in order to find regions that are truly robust to RST attacks. The fact is that any image is compressed, when transmitted over the Internet, and distortions caused by JPEG or another codec are an inherent property of such transmission. The remainder of the paper is organized as follows. Section 2 gives a brief review of conventional feature-based synchronization corrections in the watermarking schemes, as well as feature points detection and matching using deep learning techniques. Section 3 addresses a proposed method for watermarked image normalization using convolutional neural network (CNN) architectures. Section 4 provides the experimental results. Conclusions are finally given in Sect. 5.
2 Related Work Even very small geometric distortions prevent the watermark detection, especially if a watermark contains textual information in the form of a sequence of binary codes. O’Ruanaidh and Pun [5] were the first who outlined the theory of integral transform invariants and applied the discrete Fourier transform of the host image with the following Fourier–Mellin transform. The watermark was embedded in the magnitudes using the Fourier–Mellin transform, which is equivalent to the log-polar mapping. Another strategy was to identify the affine parameters using additional template [6]. Both approaches were developed by the following researchers aimed to overcome the obvious disadvantages. During the last decade, the development of frequency-based watermarking schemes invariant to RST attacks has become very popular. In [7], it was shown that the representation of the multi-scale Harris detector and wavelet moments in the local region provided robustness to some geometric and common image processing attacks. The quaternion Fourier transform was employed in watermarking algorithm
Synchronization Correction Enforced by JPEG Compression …
243
for color images [8]. Numerous watermarking schemes have been developed using different moment families of moments, such as Zernike and pseudo-Zernike moments [9], Fourier–Mellin moments [10], complex moments [2], among others. Moment invariants are also widely proposed due to the invariance of their kernel functions to RST attacks and intensity variations. The magnitudes of exponent moments (EMs) are invariant to the geometric transformation. A robust image watermarking scheme, which used EMs invariants in a non-subsampled contourlet transform domain, was introduced in [11]. A reversible scenario based on the proposed non-unit mapped radial moments and reversible interval phase modulation was suggested in [12] as an algorithm robust to the common signal processing and global/local geometrical attacks, such as the random bending, high-frequency bending and random jitter attacks. However, these theoretically interesting algorithms do not say anything about the capacity of the embedded information in such small regions and how to distribute the relatively large watermark between these regions. On the other hand, most of the rotation attacks are applied to the whole watermarked image (rather than local regions of the image) due to their simplest implementation in graphic software tools. Thus, global rotation attacks prevail, and there is no need to find the angle evaluation into each robust region. The rapid development of deep learning techniques has led to some CNN-based watermarking studies. One of the first works in this direction was presented by Kandi et al. [13], where the rotation attacks were simulated with angles from − 0.5° to 0.5°. It should be noted that most studies are based on non-blind techniques [14, 15], which are less commonly used in practice. WMNet, which implemented two methods—the detector network with backpropagation and autoencoder, was presented in [16]. Robustness to attacks was achieved by inserting a message into the cover image. Robustness testing against common image processing attacks and geometric attacks was conducted. The cropping ratio, rotation angle and rescaling ratio were set to 0.8, 10 and 0.6, respectively. HiDDeN is the first end-to-end trainable framework for data hiding, which can be applied to both steganography and watermarking [17], where the message was encoded as a bit string. The network was trained to reconstruct hidden information with the presence of Gaussian blurring, pixel wise dropout, cropping and JPEG lossy compression. Geometric attacks were not examined. Recently, neural network architecture, including the stego-image generator, watermark extractor, stego-image discriminator and attack simulator, was proposed in [18]. The rotation attack and JPEG lossy compression were simulated by the rotation layer and additive noise layer, respectively. Such procedures became available by a special stego-image generator included in the composed CNN-GAN architecture. A brief literature review shows that most of watermarking methods, conventional or CNN-based, are robust to the common image processing attacks, while a robustness to geometric attacks remains an open issue. This is, in part, due to the fact that CNNs are not invariant to scaling and rotation.
244
M. N. Favorskaya and V. V. Buryachenko
3 The Proposed Method for Synchronization Correction of Watermarked Image Synchronization correction after rotation and scaling attacks provides approximate results, with a few exceptions, for example, when the rotation angle is 90°, 180°, 270°, 360° or the scaling factor is 1. Usually, after calculating the coordinates of the distorted image, a linear interpolation is used since bilinear interpolation is computationally expensive. Interpolation fundamentally degrades the extracted watermarks, regardless of which algorithm is executed—using the conventional recalculation at the preprocessing stage or within the “rotation” layer or “scaling” layer of any CNN. Objectively, conventional recalculation is a more general approach, because the CNN special layers have to be trained in object-based way, depending on the host image and watermark representation. Various approaches for detecting robust regions in the host image are discussed in Sect. 3.1. Section 3.2 provides details of the proposed image normalization based on JSNet and Key.Net. We follow the idea that the selection of robust regions, which are immune to JPEG lossy compression, since this is an integral part of image transmission over the Internet, is implemented in the host image. All of these steps can be implemented in different methods. The choice was made due to external condition—implementation for handheld mobile devices.
3.1 Detection of Robust Regions in the Host Image The original idea in terms of watermarking paradigms is to joint JPEG lossy compression simulator with extraction of feature points. Both procedures can be done by handcrafted and deep network implementation. The aim of this study is to obtain the estimates and formulate recommendations for synchronization correction after global geometric attacks. Feature points are such points, which can be reliably extracted and matched across images. There are a huge number of publications related to extraction of the feature points in computer vision, such as image matching, camera calibration, simultaneous localization and mapping, as well as structure from motion. We have an interest to three approaches depicted in Fig. 1. There are two different goals for the detection of feature points in the host image. One of the goals is to find the robust regions for watermark embedding. In this case, the selection of non-overlapping regions, which are robust to RST attacks and have maximum squares of their neighborhoods, prevails. Another goal is to find robust regions for synchronization correction of the watermarked image after possible RST attacks. A productive idea is to increase a resistance capability of such regions simulating various attacks [19]. Many problems of computer vision had been solved using the conventional detectors of feature points, such as scale-invariant feature transform (SIFT), speed up
Synchronization Correction Enforced by JPEG Compression …
245
Fig. 1 Pipelines of three approaches for detecting robust regions in a host image: a conventional approach, b conventional approach enhanced by handcrafted imposed noise, c proposed approach enhanced by JPEG lossy compression and RST enforced feature points’ extraction
robust feat (SURF), features from accelerated segment test (FAST), FAST-enhanced repeatability (FASTER), KAZE, among others. Recently, new feature descriptors, such as binary robust-independent elementary features (BRIEF), oriented FAST and rotated BRIEF (ORB), binary robust-invariant scalable keypoints (BRISK) and aggregated local Haar (ALOHA), have been reported. Experiments show that there is no fundamental difference in which descriptor invariant to RST transform is used to detect the robust regions in still images for the problem depicted in Fig. 1a. The more important question is how to select the number and locations of such regions in the host image. We suggest to partition the image according to its resolution preventing a local excessive accumulation of robust regions. It should be noted that for experimental data evaluation, the partitioning of test host images was performed in similar manner. An intermediate solution is to impose noise simulating the raw JPEG lossy compression, but with a low computational cost (Fig. 1b). In this case, about 10–15% of robust regions in average can be lost. Our proposed approach enhanced by simulation of JPEG lossy compression, and extraction of feature points robust to RST transforms is depicted in Fig. 1c and described in detail in Sect. 3.2.
3.2 The Proposed Watermarked Image Normalization With the rapid development of deep learning, it is natural to formulate this problem as a large-scale supervised machine learning problem and train CNN to detect feature points or to detect and match them. However, not all known features can be implemented using CNN. Thus, an approach represented in Fig. 1c was realized using the recently proposed JPEG lossy compression simulator based on JSNet [20] and keypoint detection by handcrafted and learned CNN filters based on Key.Net [21].
246
M. N. Favorskaya and V. V. Buryachenko
JPEG simulation network called JSNet [20] includes discrete cosine transform (DCT)-based encoder and decoder, which simulate the compression and restoration of images in YCbCr color space, respectively. The encoder includes the color space conversion from RGB to YCbCr, sampling in 4:1:1 mode, DCT, quantization and entropy encoding. The decoder provides reverse processing in the form of the entropy decoding, dequantization, inverse DCT, inverse sampling and color space inverse conversion. To simulate 4:1:1 sampling, the Cb and Cr components are applied by 2 × 2 max pooling layer, while the Y component remains unchanged. The quantization step used to discard high-frequency information after DCT is fundamentally lossy step. There are two types of quantization table: one for the Y component and another for both Cb and Cr chrominance components. The rounding operation is simulated by imposing 3D noise mask for each color component with a disturbance range of [−0.5, 0.5]. Such masks provide floating values that are convenient for network calculation and backpropagation. The 3D noise mask rounding simulation method is provided by Eq. 1, where ODCT is the output of DCT transform layer, O DCT is the quantified one of ODCT , ε is the 3D noise mask, and Q is the quantization table of a specialized quality factor. = ODCT + ε Q ODCT
(1)
Parameter Q is produced by Eq. 2, where QY is the standard quantization table reflecting a sensitivity of human vision to different frequencies, and QF is the quality factor. 50 + S × Q Y 100 200 − 2Q F if Q F ≥ 50 S= 5000/Q F if Q F < 50
Q=
(2)
The loss function is the root mean square error between the reconstructed and target images. We applied DCT-based encoder and decoder to train JSNet with a mean value of a quality factor equaled 60. For our task, we only need to use the encoder. The next step is to find regions robust to RST attacks using feature points. A promising idea is to detect keypoints using handcrafted and learned CNN filters within a shallow multi-scale architecture. In this case, the handcrafted filters provide anchor structures for the learned filters. This is the basic approach of the recently proposed Key.Net architecture [21]. Key.Net architecture is simple and applicable for handheld mobile devices. The handcrafted filters are inspired by the Harris and Hessian detectors. A convolutional layer with M filters, ReLU activation functions and a batch normalization layer form the learned block. Key.Net includes three scale levels of the input image, which is blurred and downsampled by a factor of 1.2. All three streams share the weights, forming a set of candidates for final keypoints. The feature maps from all scale levels are then upsampled, concatenated and fed to the last convolutional layer generating the final response map. The host image
Synchronization Correction Enforced by JPEG Compression …
247
is partitioned into 192 × 192 input patches in YCbCr color space representation to improve a robustness to changes in illumination. As with JPEG simulator, we used a part of entire Key.Net architecture after its training because the image matching also provided by Key.Net is not acceptable for our task. The detected robust feature points are considered as a set of robust points, the coordinates of which are transmitted as a part of the secret key. Due to this, feature points are extracted from regions 192 × 192 pixels, but only one point represents one region. Their positions form a structure remembering a rectangle. Using the corner points, we check a limited number of combinations of feature points in order to find the affine coefficients a11 , …, a32 in Eq. 3, where x and y are the coordinates of robust regions taken from the secret key and x and y are the coordinates of robust regions received from the transmitted watermarked image. ⎡ ⎤ ⎡ ⎤ ⎡ x⎤ x a11 a12 1 ⎢ ⎥ ⎣ ⎢ ⎥ ⎣ y ⎦ = a21 a22 1 ⎦ · ⎣ y ⎦ a31 a32 0 1 1
(3)
The best coincidence is defined by the voting procedure.
4 Experimental Results The efficiency of the proposed algorithm for detecting the relevant regions for a watermark embedding was tested using the dataset as a part of the CLIC challenge [22]. This dataset contains several hundred professional photographs, including various foreground and natural objects, as well as the ground-truth feature points. Most of the images have a resolution of 2048 × 1350 pixels. To train the neural network, the dataset was divided into the training and test sets in a ratio 70% and 30%, respectively. The experiments were conducted using Key.Net [21]. Figure 2 depicts examples of images with different textural complexity, detecting feature points by Key.Net and selecting the anchor points distributed into the cells of the imposed grid. JPEG compression was simulated by JSNet with different values of a quality factor. Figure 3 demonstrates that the detection of feature points is robust to various parameters of JPEG compression caused by intentional attacks or transmissions over the Internet. About 95% of feature points remain at the same positions, and the changes concern to the feature strength. Moreover, the algorithm is quite robust to rotation. Experiments show that the set of anchor points remain practically unchanged with JPEG compression and affine transform. Therefore, the affine parameters can be recalculated based on the anchor points. Two metrics are usually used to assess the quality of the reconstructed image [23]. Peak signal-to-noise (PSNR) metric describes the similarity of two images in a logarithmic scale using Eq. 4, where I i, j is the intensity of the host image in pixel with coordinates (i, j).
248
M. N. Favorskaya and V. V. Buryachenko
Fig. 2 Detecting stable feature points in images with different textural complexity: a original images from CLIC dataset (from left to right—low-textured image 0a77e58.png, middle-textured image 180b64c225.png, high-textured image 88b4d08.png and high-textured image52c4ac714.png), b detection of feature points, c selection from 12 to 16 anchor points
Fig. 3 Comparison of feature points detected by Key.Net: a original image, b JPEG compression with a quality factor 0.7, c JPEG compression with a quality factor 0.3, d rotation attack (30°), e feature points in the original image, f feature points in the compressed image with a quality factor 0.7, g feature points in the compressed image with a quality factor 0.3, h feature points in the rotated image
PSNR = 10 log10
max(Ii, j )2 MSE
(4)
Mean square error (MSE) between the host and watermarked image is calculated by Eq. 5, where m and n are the width and height of an image, respectively, and I˜i, j is the intensity of the watermarked image in pixel with coordinates (i, j).
Synchronization Correction Enforced by JPEG Compression …
249
Table 1 Quality estimates Image
Textural complexity
JSnet + Key.Net
Key.Net PSNR, dB
NCC
PSNR, dB
NCC
180b64c.png
High
38.71
0.985
38.73
0.991
52c4ac714.png
High
37.15
0.923
37.89
0.985
592a02e.png
Middle
33.14
0.918
34.50
0.987
180b64c225.png
Middle
36.79
0.976
37.15
0.980
7641f41.png
Low
37.48
0.933
38.16
0.954
0a77e58.png
Low
38.55
0.961
39.47
0.982
MSE =
n m
2 1 Ii, j − I˜i, j m × n i=1 j=1
(5)
The normalized correlation coefficient (NCC) values are calculated by Eq. 6, where μ I and μ I˜ are the mean values of the host and reconstructed images, respectively. m n i=1
NCC = m n i=1
j=1
Ii, j − μ I × I˜i, j − μ I˜
2 2 m n ˜ I − μI × − μ ˜ i, j i=1 j=1 I
j=1
Ii, j
(6)
NCC values that are close to 1 indicate good quality of the watermark embedding algorithm and on the contrary when NCC values are close to 0. Table 1 shows the results of evaluating the quality of watermarked images. Simulation of JPEG compression using JSNet increases PSNR estimates by 1.4 dB and NCC estimates by more than 10% with an average value of 0.985.
5 Conclusions Robust watermark extraction from images after unauthorized RST attacks is impossible without synchronization correction. We offer a procedure to detect the robust regions in the partitioned watermarked image. Moreover, we have enforced this approach by JPEG lossy compression simulation. Due to the limited mobile resources, we chose JSNet and Key.Net with simple architectures under proposition of affine distortions. Our experiments allowed us to compare several approaches for synchronization correction of images from dataset “Professional” of the CLIC challenge. The best statistical approximation was obtained using deep learning and achieved a repeatability on level of 84%, while the computational costs increased
250
M. N. Favorskaya and V. V. Buryachenko
by 20–25% (in relative operations) compared to conventional extraction of feature points. Acknowledgements The reported study was funded by the Russian Fund for Basic Researches according to the research project no. 19-07-00047.
References 1. Cox, I., Miller, M., Bloom, J., Fridrich, J., Kalker, T.: Digital watermarking and steganography, 2nd edn. Elsevier, Amsterdam, The Netherlands (2007) 2. Zhu, H., Liu, M., Li, Y.: The RST invariant digital image watermarking using Radon transforms and complex moments. Dig. Sig. Process. 20, 1612–1628 (2010) 3. Zhang, Y., Wang, C., Xiao Zhou, X.: RST resilient watermarking scheme based on DWT-SVD and scale-invariant feature transform. Algorithms 10, 41.1–41.21 (2017) 4. Favorskaya, M., Savchina, E., Gusev, K.: Feature-based synchronization correction for multilevel watermarking of medical images. Proc. Comput. Sci. 159, 1267–1276 (2019) 5. O’Ruanaidh, J., Pun, T.: Rotation, scale, and translation invariant digital image watermarking. Sig. Process. 66(3), 303–317 (1998) 6. Pereira, S., Pun, T.: Robust template matching for affine resistant image watermarks. IEEE Trans. Image Process. 9(6), 1123–1129 (2000) 7. Wang, X.-Y., Yang, Y.-P., Yang, H.-Y.: Invariant image watermarking using multi-scale Harris detector and wavelet moments. Comput. Electr. Eng. 36(1), 31–44 (2010) 8. Ouyang, J., Coatrieux, G., Chen, B., Shu, H.: Color image watermarking based on quaternion Fourier transform and improved uniform log-polar mapping. Comput. Electr. Eng. 46, 419–432 (2015) 9. Singhal, N., Lee, Y.-Y., Kim, C.-S., Lee, S.-U.: Robust image watermarking using local Zernike moments. J. Vis. Commun. Image Represent. 20(6), 408–419 (2009) 10. Shao, Z., Shang, Y., Zhang, Y., Liu, X., Guo, G.: Robust watermarking using orthogonal Fourier-Mellin moments and chaotic map for double images. Sig. Process. 120, 522–531 (2016) 11. Wang, X.-Y., Wang, A.-L., Yang, H.-Y., Zhang, Y., Wang, C.-P.: A new robust digital watermarking based on exponent moments invariants in nonsubsampled contourlet transform domain. Comput. Electr. Eng. 40(3), 942–955 (2014) 12. Golabi, S., Helfroush, M.S., Danyali, H.: Non-unit mapped radial moments platform for robust, geometric invariant image watermarking and reversible data hiding. Inf. Sci. 447, 104–116 (2018) 13. Kandi, H., Mishra, D., Sai Gorthi, S.R.K. Exploring the learning capabilities of convolutional neural networks for robust image watermarking. Comput. Secur. 65, 247–268 (2017) 14. Yen, C.T., Huang, Y.J.: Frequency domain digital watermark recognition using image code sequences with a back-propagation neural network. Multimed. Tools Appl. 75(16), 9745–9755 (2016) 15. Sun, L., Xu, J., Liu, S., Zhang, S., Li, Y., Shen, C.: A robust image watermarking scheme using Arnold transform and BP neural network. Neural Comput. Applic. 30, 2425–2440 (2018) 16. Mun, S.-M., Nam, S.-H., Jang, H., Kim, D., Lee, H.-K.: Finding robust domain from attacks: a learning framework for blind watermarking. Neurocomputing 337, 191–202 (2019) 17. Zhu, J., Kaplan, R., Johnson, J., Li Fei-Fei, L.: HiDDeN: hiding data with deep networks. CoRR arXiv preprint, arXiv:1807.09937 (2018) 18. Hamamoto, I., Kawamura, M.: Neural watermarking method including an attack simulator against rotation and compression attacks. IEICE Trans. Inf. Syst. E103–D(1), 33–41 (2020) 19. Tsai, J.S., Huang, W.B., Kuo, Y.H.: On the selection of optimal feature region set for robust digital image watermarking. IEEE Trans. Image Process. 20(3), 735–743 (2011)
Synchronization Correction Enforced by JPEG Compression …
251
20. Chen, B., Wu, Y., Coatrieux, G., Chen, X., Zheng, Y.: JSNet: a simulation network of JPEG lossy compression and restoration for robust image watermarking against JPEG attack. Comput. Vis. Image Underst. 197–198, 103015.1–103015.9 (2020) 21. Barroso-Laguna, A., Riba, E., Ponsa, D., Mikolajczyk, K.: Key.Net: keypoint detection by handcrafted and learned CNN filters. IEEE Int. Conf. Computer Vision, pp. 5836–5844. Seoul, Korea (2019) 22. CLIC Challenge, https://www.compression.cc/challenge/, last accessed 2020/12/25 23. Favorskaya, M.N., Buryachenko, V.V.: Authentication and copyright protection of videos under transmitting specifications. In: Favorskaya, M.N., Jain, L.C. (eds.) Computer Vision in Advanced Control Systems-5. ISRL, vol. 175, pp. 119–160. Springer, Cham (2020)
Tracking of Objects in Video Sequences Nikita Andriyanov , Vitaly Dementiev , and Dmitry Kondratiev
Abstract The paper considers the issues of trajectory tracking of a large number of objects on video images in modes close to real time. To implement such support, a combination of algorithms based on the YOLO v3 convolutional neural network (CNN), doubly stochastic filters and pseudo-gradient procedures for aligning image fragments is proposed. The obtained numerical performance characteristics show the consistency of such a combination and the possibility of its application in real video processing systems. Keywords Trajectory tracking · Neural networks · Pseudo-gradient algorithms · Nonlinear filtering
1 Introduction Today, computer vision (CV) systems are one of the most important and prominent representatives of solutions in the field of artificial intelligence (AI). CV can be defined as the science of computers and software systems that can recognize and understand images and scenes. Among the many tasks of CV, such as filtering, recognition, segmentation and imitation of images, a special place is occupied by the task of detecting objects of various kinds in images [1–5]. This problem is encountered in a significant number of practical applications. Object detection is widely used for face detection, vehicles in traffic, pedestrian counting, web images, security systems, etc. The most interesting such applications are systems for detecting and tracking moving objects on video sequences. The complexity of creating such systems is due to the need to solve several problems at once, namely highlighting a useful object against a certain background, identifying the type of this object and testing the hypothesis about its correspondence to the objects detected in previous N. Andriyanov (B) Financial University Under the Government of the Russian Federation, Leningradsky pr-t, 49, 125167 Moscow, Russia V. Dementiev · D. Kondratiev Ulyanovsk State Technical University, Severny Venets str. 32, 432027 Ulyanovsk, Russia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_21
253
254
N. Andriyanov et al.
images. Usually, such trajectory tracking systems are associated with the use of a gating mechanism, when an object in a subsequent image is searched for in a small area centered at the point of its previous location [6–8]. This raises a number of problems associated by determining the shape and size of such an area, as well as dividing a large number of moving objects into video sequences. The very solution to the problem of primary detection of an object is also not trivial. Classical algorithms for detecting an object in an image are usually based on building some statistics in a given area and comparing these statistics with threshold values [4, 5]. However, the optimal choice of such statistics when detecting real objects is an extremely difficult task associated with constructing accurate probability distributions of brightness for the background image and the detected object itself. Usually, such distributions are unknown, and their approximation by simple types of distributions (e.g., Gaussian) leads to significant errors. The solution of this problem may be the application of algorithms based on deep learning using a large volume of training sample. An example of such algorithms is such modern and highprecision procedures as RCNN, Fast-RCNN, Faster-RCNN, RetinaNet, as well as SSD and YOLO. However, the generalization of these algorithms for processing a video sequence in the framework of solving the trajectory tracking problem leads to insurmountable difficulties associated with the need for significant orders of magnitude expansion of the training sample. Under these conditions, it seems necessary to search for new algorithms that combine the high efficiency of neural network procedures in the primary detection of objects, a priori information about the nature of the movement of moving objects, as well as the possibility of using the results of processing previous images in sequence. In this paper, it was proposed to use a variant of such a combination based on the YOLO v3 CNN, doubly stochastic filters and pseudo-gradient procedures for aligning image fragments.
2 Object Detection in Video Sequences For the primary detection of objects on separate frames of a video sequence, it is proposed to use CNNs. An analysis of the available architectures of such networks showed that one of the most efficient neural networks in terms of the achieved accuracy characteristics is YOLOv3 [9, 10]. Despite the fact that nowadays YOLOv5 has appeared, the YOLOv3 is a wellknown and most recommended tool for detection tasks. This network consists of 106 convolutional layers and detects small objects better than its predecessor YOLOv2. The main feature of YOLOv3 is that the output has three layers, each of which is designed to detect objects of different sizes. At the output, the neural network gives a set of tensors with dimensions 13 × 13 × 75, 26 × 26 × 75 and 52 × 52 × 75, which contain all the information necessary to apply the windows framing the object to the image. An important feature of this network is the ability to indirectly determine the probability of a false alarm by selecting appropriate threshold values.
Tracking of Objects in Video Sequences
255
Fig. 1 Vehicle detection in aerial photography
Table 1 Dependence of true and false detection characteristics at different thresholds Threshold Number of true targets True detection False detection Number of false targets 0.2
771
665
42
39
0.3
771
664
42
37
0.4
771
657
42
34
0.5
771
647
42
28
As an illustration, let us consider the process of detecting and identifying vehicles in aerial photography (Fig. 1). Table 1 shows the summary results of the YOLOv3 neural network while processing a large number of such images. Thus, with a decrease in the detection threshold, the neural network begins to correctly detect a larger number of objects. This also increases the likelihood of false detection. In general, it is possible to achieve the probability of correct detection 91–92%, which is a good indicator for processed images.
3 Trajectory Tracking Algorithm To build a tracking system for moving objects, it is not enough just to detect them, it is necessary to compare the positions of one and the same object on a sequence of frames and build trajectories of the objects. As the most preferable method for estimating coordinates and constructing trajectories, a method based on a nonlinear doubly stochastic filter [11, 12] was chosen. In this case, to simplify the calculations, it is possible to assume that the object can move only in the observed plane along two coordinates. To do this, assume that some object can move across the territory with variable and random acceleration. So, for example, when approaching an intersection,
256
N. Andriyanov et al.
a car intensively slows down, and after overcoming it, it begins to accelerate up to the permitted speeds. Then the algorithm uses the following models for unknown coordinates requiring t t , y Ei ) of i-th object at the time moment t: an estimate (x Ei t−1 t−1 t−2 t−2 t t xEi − xEi , = 2xEi − xEi + aExi xEi t−1 t−1 t−2 t−2 t t yEi = 2yEi − yEi + aEyi yEi − yEi , t−1 t−1 t t t t aExi = rax aExi + ξaxi , aEyi = ray aExy + ξayi ,
(1)
where rax , ray are scalar parameters that determine the rate of change of accelerations t t t t and aEyi ; ξaxi , ξayi are independent normal random variables with zero mean and aExi t−1 t t t 2 variance σξ . Let us introduce the following definitions vExi = xEi − xEi , vEyi = t t t−1 t t t t t t t yEi − yEi and X Ei = xEi vExi aExi , Y Ei = yEi vEyi aEyi . Then model (Eqs. 1) can be rewritten as follows: t
t−1
t
t
t−1
t
t t X Ei = ℘Exi X Exi + ξ xi , Y Ei = ℘Eyi Y Eyi + ξ yi ,
⎛
t where ℘Exi
⎛
⎛ ⎛ ⎞ ⎞ ⎞ 1 1 0 1 1 0 0 t t−1 t−1 t = ⎝ 0 1 + aExi = ⎝ 0 1 + aEyi 0 ⎠, ξ xi = ⎝ 0 ⎠, 0 ⎠, ℘Eyi t 0 0 rax ξaxi 0 0 ray
⎞ 0 t ξ yi = ⎝ 0 ⎠. t ξayi Another version of the presented models can be the following nonlinear vector stochastic expressions: t−1 t−1 t t t t t t X Ei = ϕExi X Ei + ξ xi , Y Ei = ϕEyi Y Ei + ξ yi . ⎛ ⎞ ⎞ 1 1 0 1 1 0 t−1 t−1 t−1 ⎠ t−1 t−1 ⎠ t t Then ϕExi X Ei = ⎝ 0 aExi Y Ei = ⎝ 0 aEyi vEyi . vExi and ϕEyi 0 0 rax 0 0 ray In this case, the observation model can be rewritten as follows:
t−1
⎛
t
t
t t z Exi = C x X Ei + n tExi , z Eyi = C y Y Ei _n tEyi ,
i = 1, 2, . . . , N0 , t = 1, 2, . . . , T, where C x = C y = 1 0 0 . The obtained designations allow to apply doubly stochastic nonlinear filtering to filter observations and construct forecasts behavior t−1 of object [12]. Moreover, it is t t t−1 t t X Ei and Y extEi = ϕEyi Y Ei as extrapolated necessary to introduce X extEi = ϕExi
Tracking of Objects in Video Sequences
257
t−1 predictions of object coordinates at a point in time t using previous observations z Exi t−1 and z Eyi . The error covariance matrices of this extrapolation are as follows:
t
X extEi −
t X extEi
t
t X extEi − X¯ extEi
T
t Pextxi
=M
t Pextyi
t−1 t−1 t t−1 T t t = ϕExi X¯ Ei Px ϕExi X¯ Ei + Vxξ i,
T t t t t = M Y extEi − Y¯extEi Y extEi − Y¯extEi
t−1 t−1 t t−1 T t t = ϕEyi Y¯Ei Py ϕEyi Y¯Ei + Vyξ i, where Pxit−1 , Pyit−1 are filtering error covariance matrices at time (t − 1), and t tT t tT t t and V = M ξ ξ = M ξ yi ξ yi are diagonal covariance matrices of Vxξ xi xi i yξ i t
t
random additions ξ xi and ξ yi . Then it is possible to write the following relations for doubly stochastic coordinate filters: t t t t ˆt t t t , , Y Ei = Yˆ extEi + Byit z Eyi Xˆ Ei = Xˆ extEi + Bxit z Exi − xˆextEi − yˆextEi
(2)
t
t t−1 t t t where xˆextEi C xT Dxi ; , yˆextEi are first elements of vectors Xˆ extEi and Y extEi ; Bxit = Pextxi t t t t t t T t−1 T 2 T 2 Byi = Pextyi C y Dyi ; Dxi = CPextxi C + σn ; Dyi = C Pextyi C + σn . The variance of t , the filtering error at each step is determined by the matrices Pxit = E − Bxit Pextxi
t . Pyit = E − Byit Pextyi When applying this algorithm, the question arises regarding the determination of the time sequence of observations corresponding to the coordinates of the of pairs t t . To determine these observations, we use the following .z Eyi object centers z Exi algorithm consisting of five steps.
1.
2.
Let us set the initial conditions such as the maximum speed of the targets Vmax , time t, through which detection data is expected from the next frame, the maximum number of target skips n, variance σ of the error in determining the coordinates of an object when detected by a neural network and variance of target maneuvering speed γ . If the number of skips in the route of the object is more than n, then calculations for such a route are no longer performed and are not taken into account. During detection of i-th object on the separate k-th image, it is necessary to k k . Taking into calculate the coordinates of the center of this object z Exi .z Eyi account the fact that one of the results of the detector operation based on the
258
3.
N. Andriyanov et al.
YOLOv3 network is the framing rectangles, the calculation of the center of the object is a trivial task. Then for the first point trajectory, it should set the area Oi having the k of each k center coordinates x ˆ , y ˆ extEi extEi , composed of the first elements of the vectors
t
t
X extEi , Y extEi , in which the next point on the route is supposed to hit. For the 4.
5.
k k k k first point of the route, xˆextEi = z Exi , yˆextEi = z Eyi . For the following (k + 1)-th image in the area Oi , it is necessary to detect of an object with a given type. In this case, using the YOLOv3 neural network, the minimum possible threshold of 0.1 is introduced, which provides the k+1 k+1 .z Eyi maximum probability of correct detection. Then the calculation of z Exi is performed. k+1 k+1 . yˆEyi as the first Next step is directed on calculation of estimations xˆExi t t elements of vectors X Ei , Y Ei , obtained from nonlinear filter (Eqs. 2).
Figure 2 shows the result of processing a video sequence of images in the form of trajectories of vehicles highlighted in different colors. Table 2 shows some research results characterizing the dependence of the quality of trajectory tracking on the size and shape of the region Oi . It is obvious that changing the size of the window allows us to increase the proportion of correctly detected trajectories, but changing the shape of the window does not give such a significant increase. At the same time, the analysis shows that a decrease in the number of correctly detected trajectories with an increase in the detection area is mainly due to an increase in the number of false detections of similar objects. Instead
Fig. 2 Recognition of trajectories of vehicles in dense traffic
Tracking of Objects in Video Sequences
259
Table 2 Percentage of correctly detected trajectories of objects when using doubly stochastic filtering Detection area size Detection area-type square (%)
Circle (%) Ellipse in the direction of movement of the object (%)
2σ
94.6
96.1
96.3
3σ
94.1
95.8
96.1
4σ
91.3
92.0
92.3
5σ
90.3
90.5
90.9
Table 3 Percentage of correctly detected trajectories of objects when using Kalman filtering Detection area size Detection area-type square (%)
Circle (%) Ellipse in the direction of movement of the object (%)
2σ
91.2
91.3
91.4
3σ
89.1
89.4
89.6
4σ
87.1
87.5
87.8
5σ
85.3
86.5
86.6
of a nonlinear doubly stochastic filter, it is possible to use a simpler version of the linear Kalman filter [5], which does not take into account the possibility of moving objects with variable acceleration. Table 3 shows the corresponding performance characteristics. Analysis of the presented data shows a significant deterioration in the characteristics of the trajectory tracking process. This is obviously due to the errors in predicting the movement of the object, which arise in the linear Kalman filter due to the impossibility of taking into account the possibility of uniformly accelerated motion.
4 Using Pseudo-Gradient Procedures to Improve the Quality of Trajectory Tracking Another option to improve the quality of trajectory tracking is to search for an object selected in previous images in subsequent frames. This approach can be substantiated by small time intervals between the registrations of individual frames, which make it possible to expect a previously detected object in a certain area of the image. The problem of finding the center of this region can be significantly simplified using the already described doubly stochastic filter. In this case, such a problem can be interpreted as the problem of estimating the parameters of deformations of the parameters of a previously selected fragment in a time sequence of images. Then, to estimate
260
N. Andriyanov et al.
such parameters, one can use high-speed identification-free pseudo-gradient (PG) procedures, for example, the pseudo-gradient identification method (PIM). PIM allows minimizing the metric distance between the image of the reference object and the image of the desired object in accordance with the accepted model of geometric deformations (GD). Within the framework of this study, it should assume that possible deformations of the desired image in relation to the reference image can be reduced to an affine transformation. In PIM, identification parameters α are searched recursively [13–15]:
αˆ t = αˆ t−1 − t β t , Within the framework of the problem of trajectory tracking, taking into account the peculiarities of registration of objects (using aerial photography), the identified parameters can be three numbers α = stx , sty .αt , where stx and sty are relative displacements over time (t − 1, t) on axis x and y, respectively; αt is relative angular displacement over time (t − 1, t). The experiments carried out on the sequential alignment of the selected fragments show the possibility of a confident assessment of these parameters. As an illustration, Fig. 3 shows the dependence of the estimate of one of the estimated parameters st x depending on iteration number.
Fig. 3 Usual character of convergence in pseudo-gradient estimation of object displacement parameters
Table 4 Percentage of correctly detected trajectories with dense movement of objects Detection area size Detection area-type square (%)
Circle (%) Ellipse in the direction of movement of the object (%)
2σ
95.7
96.5
96.9
3σ
95.0
95.3
96.2
4σ
93.4
93.1
93.6
5σ
92.3
91.6
92.1
Tracking of Objects in Video Sequences
261
Table 4 shows the results of evaluating the quality of the trajectory tracking in the case of using a pseudo-gradient evaluation and neural network processing in subsequent images. In this case, in item 4 of the presented trajectory tracking algorithm, the following relations are used: k+1 k+1 k k = z Exi + stx , z Eyi = z Eyi + sty . z Exi
Analysis of the information presented shows that the use of pseudo-gradient estimation in this problem is justified. The reason for this is, among other things, a very small difference between the aligned image fragments when assessing the local movement of objects.
5 Conclusions Thus, in this paper, method for solving the problem of trajectory tracking of a large number of moving targets on video images was proposed. Such method is based on the use of a combination of a neural network detector, nonlinear doubly stochastic filters and pseudo-gradient procedures for aligning image fragments. The results obtained can be used in real systems for processing video images and in control systems for unmanned aerial vehicles. Acknowledgements The reported study was funded by RFBR grants, Projects No. 19-29-09048 and No. 18-47-730009.
References 1. Bouman, C.: Model Based Imaging Processing. Purdue University 414 (2013) 2. Jensen, J.: Introductory digital image processing: a remote sensing perspective. Pearson Educ. 659 (2015) 3. Basener, W., Ientilucci, E., Messinger, D.: Anomaly detection using topology. Algorithms Technol. Multispectral Hyperspectral Ultraspectral Imagery XIII SPIE 6565, 22–34 (2007) 4. Denisova, A., Myasnikov, V.: Detection of anomalies in hyperspectral images. Comput. Opt. 38(2), 287–296 (2013) 5. Krasheninnikov, V., Vasil’ev, K.: Multidimensional image models and processing. Intell. Syst. Ref. Lib. 135, 11–64 (2018) 6. Vasiliev, K., Pavlygin, E., Gutorov, A.: Multi-model algorithms of data processing of the mobile radar system. Autom. Control Processes 4, 6–13 (2013) 7. Andriyanov, N., Vasil’ev, K., Dement’ev, V.: Investigation of filtering and objects detection algorithms for a multizone image sequence. Int. Arch. Photogrammetry Remote Sens. Spat. Inf. Sci.—ISPRS Arch. 42, 7–10. (2019). doi: https://doi.org/10.5194/isprs-archives-XLII-2W12-7-2019 8. Kondrat’ev, D.: Primary detection of elements on the radar image. Radioelectronic Equipment 1(7), 135–136 (2015)
262
N. Andriyanov et al.
9. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference Proceedings, 1, 6517–6525 (2017) 10. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv, pp. 1–11 (2018) 11. Vasiliev, K., Dementyev, V.: Doubly stochastic filtration of spatially inhomogeneous images. J. Commun. Technol. Electron. 65(5), 524–531 (2020) 12. Andriyanov, N., Dementiev, V., Vasiliev, K.: Developing a filtering algorithm for doubly stochastic images based on models with multiple roots of characteristic equations. Patt. Recogn. Image Anal. 29(1), 10–20 (2019). https://doi.org/10.1134/S1054661819010048 13. Magdeev, R., Tashlinskii, A.: Efficiency of object identification for binary images. Comput. Opt. 2, 277–281 (2019) 14. Tashlinskii, A.: Pseudogradient Estimation of Digital Images Interframe Geometrical Deformations. Vision Syst.: Segmentation Pattern Recogn. 1, 465–494 (2007) 15. Krasheninnikov, V., Kuvayskova, Y.., Subbotin, A.: Pseudo-gradient algorithm for identification of doubly stochastic cylindrical image model. Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 24th International Conference KES-2020. Procedia Computer Science, vol. 176, pp. 1858–1867 (2020)
Multi-Criteria Decision-Analysis Methods—Theory and Their Applications
A New Approach to Identifying of the Optimal Preference Values in the MCDA Model: Cat Swarm Optimization Study Case Jakub Wie˛ckowski, Andrii Shekhovtsov, and Jarosław Wa˛tróbski
Abstract The randomness of data appears in many problems in various fields. Stochastic optimization methods are often used to solve such problems. However, a large number of methods developed makes it difficult to determine which method is the optimal choice for solving a given problem. In this paper, the cat swarm optimization (CSO) was used to find the optimal preference values of characteristic objects, which were then subjected to applying the characteristic objects method (COMET). The determined problem was solved using the randomly chosen training and testing sets, where both were subjected to two criteria. The study’s motivation was to analyze the effectiveness of the CSO algorithm compared to other stochastic methods in solving problems of a similar class. The obtained solution shows that the used algorithm can be effectively applied to the defined problem, noting much better results than previously tested methods. Keywords COMET · Cat swarm optimization · Multi-criteria decision analysis
1 Introduction Random processes are a significant part of the problems associated with solving today’s problems. The randomness influences the variability of the obtained results, making it difficult to determine whether the obtained result is meaningful. There are many methods for solving optimization problems in a given space, but some
J. Wie˛ckowski · A. Shekhovtsov Research Team on Intelligent Decision Support Systems, Department of Artificial Intelligence and Applied Mathematics, Faculty of Computer Science and Information Technology, West ˙ Pomeranian University of Technology in Szczecin, ul. Zołnierska 49, Szczecin 71-210, Poland J. Wa˛tróbski (B) ˙ Jana Pawła II 22A, Szczecin Institute of Management, University of Szczecin, Aleja PapieZa 71-210, Poland e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_22
265
266
J. Wie˛ckowski et al.
give more acceptable results than others [6, 7, 13]. An important factor is also the repeatability of the results obtained and the effectiveness of the actions performed. Stochastic optimization methods, where the data’s randomness influences the result, are frequently used in optimization problems [19]. The main characteristic of the operation of these methods is the analysis of the state in which the algorithm is located, so that each successive step brings closer to obtaining an optimal solution. In complex problems, an important parameter is the time efficiency of selected methods, which should strive to minimize the algorithm’s working time while solving the problem, giving satisfactory results [3, 5]. Initially, algorithms from the family of stochastic optimization methods prioritized finding a correct solution to the problem. At a later stage, attempts were made to develop methods that would allow obtaining equally effective results, while operating much faster. Cat swarm optimization belongs to the family of stochastic optimization methods, and its main idea is to reflect the behavior of cats in spreading information. The technique was developed in 2006 by Shu-Chuan, Pei-wei and Jeng-Shyang, and the authors have shown that the method is more efficient than particle swarm optimization (PSO), among others [2]. The performance of the algorithm was developed based on two phases, namely seeking mode and tracing mode. Its performance was tested and found to be a suitable choice for solving complex problems. The basic CSO algorithm has also been extended with additional assumptions, such as binary cat swarm optimization (BCSO) [20] and integer cat swarm optimization (ICSO) [14], which shows that it is willingly used by experts solving optimization problems. Multi-criteria decision analysis (MCDA) methods are widely used in many problems [23]. They are applied, for example, in the selection of material suppliers [8, 24], in planning the location of industrial buildings [4], or in combination with stochastic optimization methods [10, 22]. The characteristic objects method (COMET) is one of the MCDA methods based on a rule-based approach to problem solving [12]. Based on the expert knowledge, comparisons are made between pairs of characteristic objects, and the result is a rule base used to calculate preference values for the analyzed alternatives [9]. The advantage of this method is that the phenomenon of ranking reversal does not occur when the number of alternatives in the considered set changes [11, 17]. In this paper, the cat swarm optimization was used to find the optimal characteristic objects preferences. The problem was considered in a two-dimensional space, for two criteria C1 and C2 . Then, with the obtained preference values and using the COMET method, the best possible representation of the initial problem space for both criteria was sought. The purpose of such a procedure was to examine the effectiveness of the cat swarm optimization algorithm in selecting parameters for the selected decision problem. The rest of the paper is organized as follows. In Sect. 1, the preliminaries of the fuzzy logic theory are presented. Section 2 includes the main assumptions of the COMET method and subsequent steps of its application. Section 3 is the introduction to the cat swarm optimization and contains the description of the algorithm. In Sect. 4, the study case is presented, in which cat swarm optimization was used to find characteristic objects preferences calculated with COMET method. Finally,
A New Approach to Identifying of the Optimal …
267
the results summary are presented, and conclusions from the research are drawn in Sect. 5.
2 The Characteristic Objects Method The characteristic objects method (COMET) is based on the pairwise comparison of the characteristic objects created using cartesian products for predefined triangular fuzzy numbers (TFNs) [21] describing criteria taken into account in a given problem [15, 16]. Expert knowledge is of significant importance here, based on which the resulting pairs of objects are analyzed. An unquestionable advantage of this method is its resistance to the phenomenon of ranking reversal, which makes it the first method from the group of MCDA methods that is resistant to this phenomenon [18, 25]. The full description of this method could be found in [15, 16].
3 Cat Swarm Optimization Cat swarm optimization is a swarm intelligence algorithm which was inspired by observing the behavior of cats [2]. There are also several variants of this algorithm which are applicable in different fields [1]. Cat swarm optimization uses a model of a cat’s behavior to solve optimization problems. Every cat has its position in M-dimension space, velocity for each dimension and the fitness value, representing how good this cat is. The final solution is the best position in one of the cats on the end of iterations. Every cat has two sub-models: seeking mode and tracing mode. The first one is modeling the cat, which is resting and looking for the next position to move on. The second one is modeling the cat, which is tracing some target. Boolean flag in every cat determines the current mode of the cat.
3.1 Seeking Mode Seeking mode sub-model requires defining several parameters, such as: • SMP (seeking memory pool), which defines the size of seeking memory pool for each cat; • SRD (seeking range of the selected dimension), which declares the mutative ratio for the selected dimensions; • CDC (count of dimension to change), which define how many dimensions should be changed;
268
J. Wie˛ckowski et al.
• SPC (self-position considering), which decides whether or not the cat could stay in place instead of moving. Next, seeking mode sub-model can be described in five steps as follows: 1. Make SMP copies of the present position of the catk . If the values of SPC is true, make SMP−1 copies instead and consider the present position as one of the candidates. 2. For each copy, according to CDC, randomly add or subtract SRD percents of the current values and replace the old ones. 3. Calculate the fitness value (F S) for each candidate point. 4. If all F S are not equal, calculate the selecting probability of each candidate using Formula (1). Pi =
|F Si − F Smax | , where0 < i < j F Smax − F Smin
(1)
5. Randomly pick the candidate point to move from candidate points and replace the position of catk with the position of this candidate.
3.2 Tracing Mode Tracing mode can be described in three steps as follows: 1. Update the velocities for every dimension according to (2). vk,d = vk,d + r1 · c1 · (xbest,d − xk,d ),
(2)
where d = 1, 2, ..., M, xbest,d is the position of the cat with the best fitness value, xk,d is the position of catk , c1 is a constant, and r1 is a random value in the range [0, 1]. 2. Check if the velocities are in the range of max velocity. If new velocity value is over-range, set it equal to the limit. 3. Update the position of catk according to the Eq. 3. xk,d = xk,d + vk,d
(3)
3.3 Cat Swarm Optimization As it was described in Sects. 3.1 and 3.2, CSO algorithm consist of two sub-models. To combine these two modes, the mixture ratio (MR) parameter is used. The process of cat swarm optimization can be described in six steps as follows:
A New Approach to Identifying of the Optimal …
269
1. Create N cats. 2. Randomly put the cats into the M-dimensional solution space and randomly select velocity values. Then, randomly pick a number of cats and set them into tracing mode according to MR. Other cats should be in seeking mode. 3. Evaluate fitness function values for each cat and remember the position xbest of the best cat. 4. Move the cats accordingly to their modes. If catk is in tracing mode, apply the tracing mode process. Otherwise, apply the seeking mode process. 5. Randomly select the number of cats and put them into tracing mode according to MR, put other cats into the seeking mode. 6. Check if terminate conditions are fulfilled. If not, repeat steps from 3 to 5.
4 Study Case This paper contains a model based on the cat swarm optimization method’s performance in selecting the parameters of the characteristic object values. Values were sought, which provided the smallest possible sum of errors calculated from the sum of the absolute differences between the initially defined preference value and those obtained using the COMET method. That procedure was aimed at mapping the initial problem surface as accurately as possible. R1 : R2 : R3 : R4 : R5 : R6 : R7 : R8 : R9 :
I FC1 I FC1 I FC1 I FC1 I FC1 I FC1 I FC1 I FC1 I FC1
∼ 0.0 ∼ 0.0 ∼ 0.0 ∼ 0.5 ∼ 0.5 ∼ 0.5 ∼ 1.0 ∼ 1.0 ∼ 1.0
AN D C2 AN D C2 AN D C2 AN D C2 AN D C2 AN D C2 AN D C2 AN D C2 AN D C2
∼ 0.0T H E N ∼ 0.5T H E N ∼ 1.0T H E N ∼ 0.0T H E N ∼ 0.5T H E N ∼ 1.0T H E N ∼ 0.0T H E N ∼ 0.5T H E N ∼ 1.0T H E N
0.0000 0.0000 0.2124 0.0000 0.2014 0.0000 0.5097 0.4365 1.0000
(4)
First, the initial problem space is defined, and also, the distribution of characteristic objects is defined. Gathered data is shown in Fig. 1. A two-dimensional space was considered, which was designated for the criteria C1 and C2 . The set of training alternatives and their preferences is given in Table 1, where the values of criteria C1 and C2 were initially normalized using the minmax method. By defining the distribution of points for the Characteristic Objects, it was possible to calculate the rule base necessary to perform further calculations using the COMET method. For the problem under consideration, nine rules describing the preference values for specific variables were created, and they are represented as (4). The cat swarm optimization method was used to obtain the preferential values of the characteristic objects. Figure 2 shows how the fitness function preference changed with successive iterations of the optimization algorithm run. A significant decrease
270
J. Wie˛ckowski et al.
Fig. 1 Surface of decision problem and preferences for characteristic objects Table 1 Training set of alternatives and their preferences (P obtained; Pref reference) Ai C1 C2 Pref P Diff A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18 A19 A20
0.0592 0.8824 0.3303 0.2674 0.6210 0.2648 0.8811 0.5280 0.9299 0.1932 0.8424 0.6683 0.7867 0.2966 0.9374 0.1286 0.3281 0.2803 0.1573 0.2033
0.7016 0.4987 0.8979 0.2906 0.6470 0.0887 0.2542 0.5384 0.3649 0.4674 0.2068 0.8833 0.0221 0.6888 0.6147 0.2763 0.6120 0.8440 0.8603 0.0096
0.1222 0.4040 0.2386 0.0559 0.2705 0.0368 0.3742 0.1982 0.4221 0.0718 0.3416 0.3636 0.2941 0.1550 0.4635 0.0271 0.1407 0.2053 0.1897 0.0206
0.0057 0.0241 0.0053 0.0039 0.0160 0.0012 0.0243 0.0128 0.0261 0.0046 0.0224 0.0204 0.0186 0.0067 0.0325 0.0018 0.0075 0.0063 0.0077 0.0001
0.1166 0.3800 0.2333 0.0519 0.2545 0.0356 0.3499 0.1854 0.3961 0.0672 0.3192 0.3432 0.2755 0.1483 0.4310 0.0253 0.1332 0.1991 0.1820 0.0205
A New Approach to Identifying of the Optimal …
271
Table 2 Testing set of alternatives and their preferences (P obtained; Pref reference) Ai C1 C2 Pr e f P Diff A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18 A19 A20
0.9164 0.7812 0.5542 0.7029 0.1921 0.0570 0.9098 0.9140 0.0123 0.6414 0.0991 0.6133 0.1786 0.4931 0.8577 0.0460 0.3062 0.7496 0.4493 0.2162
0.1257 0.9697 0.0441 0.2892 0.0749 0.2241 0.5405 0.1683 0.5792 0.6586 0.6199 0.8375 0.3870 0.5827 0.1667 0.9681 0.9033 0.4780 0.5542 0.8278
0.3939 0.4508 0.1502 0.2531 0.0198 0.0141 0.4307 0.3940 0.0829 0.2843 0.0991 0.3218 0.0526 0.1930 0.3506 0.2264 0.2347 0.3096 0.1677 0.1861
0.0264 0.0346 0.0044 0.0163 0.0007 0.0006 0.0270 0.0261 0.0023 0.0172 0.0045 0.0149 0.0035 0.0105 0.0231 0.0115 0.0057 0.0199 0.0103 0.0069
0.3676 0.4162 0.1457 0.2367 0.0191 0.0135 0.4036 0.3679 0.0805 0.2671 0.0946 0.3069 0.0491 0.1825 0.3274 0.2149 0.2290 0.2897 0.1573 0.1792
in the value of the fitness function was recorded as early as the fourth iteration, after which the value varied over a small range over successive iterations. A test set was selected from the data obtained using the CSO algorithm application, which was then subjected to the COMET method to obtain object preference values. The selected data of the test set is presented in Table 2, where the values of the criteria C1 and C2 , the preference values, and the difference between them are included. Figure 3 shows the resulting problem surface obtained on the test set and the obtained characteristic objects’ positions. It can be seen that the obtained surfaces describing the problem when considering two criteria are comparable to each other. A satisfactory result was obtained already after four iterations, and subsequent iterations did not bring significant improvements in the obtained preference values, but only settled on a small range of values not influencing considerably the result. Compared to previous studies based on other stochastic optimization algorithms [26–28], the results were obtained much faster, which shows the CSO algorithm’s effectiveness in solving a specific class of problems.
272
J. Wie˛ckowski et al.
Fig. 2 Diagram of matching the fitness function
Fig. 3 Surface created with COMET method and preferences for characteristic objects
5 Conclusions Stochastic methods find their application in problems where there is the randomness of data. The results obtained may differ between those obtained in successive runs of the algorithm, as this largely depends on the data’s randomness. However, when solving the problem with the chosen method, an important aspect is the repeatability of the results and efficiently finding the desired solution. To find the optimal preference values of characteristic objects, the cat swarm optimization was used. The study was performed on a training set and a test set of 20 alternatives in both cases, each considered in two criteria C1 and C2 . Using the CSO method, it was possible to obtain satisfactory results, with a fast running algorithm. Then, using the COMET method, preference values were calculated, based on which
A New Approach to Identifying of the Optimal …
273
the initial surface for the defined problem was mapped. Combining these methods allowed the problem to be solved efficiently, obtaining results as good as the hill climbing or simulated annealing methods, while finding the optimal solution much faster. For future directions, it is worth considering comparing more stochastic optimization methods to receive more benchmarkable results. The determined space of problem can be extended, and the greater amount of criteria can be taken into account. Moreover, the input parameters for the cat swarm optimization could be changed to check if it has affected the quality of the final result. Acknowledgements The work was supported by the project financed within the framework of the program of the Minister of Science and Higher Education under the name “Regional Excellence Initiative”ž in the years 2019–2022, Project Number 001/RID/2018/19; the amount of financing: PLN 10.684.000,00 (J.W.).
References 1. Ahmed, A.M., Rashid, T.A., Saeed, S.A.M.: Cat swarm optimization algorithm: a survey and performance evaluation. Comput. Intell. Neurosci. 2020 (2020) 2. Chu, S. C., Tsai, P. W., Pan, J. S. Cat swarm optimization. In: Pacific Rim International Conference on Artificial Intelligence (pp. 854–858). Springer, Berlin, Heidelberg (2006) 3. Fouskakis, D., Draper, D.: Stochastic optimization: a review. Int. Stat. Rev. 70(3), 315–349 (2002) 4. Harper, M., Anderson, B., James, P., Bahaj, A.: Assessing socially acceptable locations for onshore wind energy using a GIS-MCDA approach. Int. J. Low-Carbon Technol. 14(2), 160– 169 (2019) 5. Heyman, D. P., Sobel, M. J. Stochastic Models in Operations Research: Stochastic Optimization, Vol. 2. Courier Corporation (2004) 6. Hu, X., Eberhart, R.: Solving constrained nonlinear optimization problems with particle swarm optimization. In: Proceedings of the Sixth World Multiconference on Systemics, Cybernetics and Informatics, Vol. 5, pp. 203–206. Citeseer (2002) 7. Karaboga, D., Basturk, B.: Artificial bee colony (ABC) optimization algorithm for solving constrained optimization problems. In: International fuzzy systems association world congress (pp. 789–798). Springer, Berlin, Heidelberg (2007) 8. Wa˛tróbski, J., Jankowski, J., Ziemba, P.: Multistage performance modelling in digital marketing management. Econ. Sociol. 9(2), 101 (2016) 9. Kizielewicz, B., Kołodziejczyk, J.: Effects of the selection of characteristic values on the accuracy of results in the COMET method. Procedia Comput. Sci. 176, 3581–3590 (2020) 10. Kizielewicz, B., Sałabun, W.: A new approach to identifying a multi-criteria decision model based on stochastic optimization techniques. Symmetry 12(9), 1551 (2020) 11. Kizielewicz, B., Wa˛tróbski, J., Sałabun, W.: Identification of relevant criteria set in the MCDA process-wind farm location case study. Energies 13(24), 6548 (2020) 12. Kizielewicz, B., Dobryakova, L.: MCDA based approach to sports players’ evaluation under incomplete knowledge. Procedia Comput. Sci. 176, 3524–3535 (2020) 13. Mahdavi, M., Fesanghary, M., Damangir, E.: An improved harmony search algorithm for solving optimization problems. Appl. Math. Comput. 188(2), 1567–1579 (2007) 14. Murtza, S.A., Ahmad, A., Shafique, J.: Integer cat swarm optimization algorithm for multiobjective integer problems. Soft. Comput. 24(3), 1927–1955 (2020)
274
J. Wie˛ckowski et al.
15. Sałabun, W.: The characteristic objects method: a new distance-based approach to multicriteria decision-making problems. J. Multi-Criteria Decis. Anal. 22(1–2), 37–50 (2015) 16. Sałabun, W., Piegat, A.: Comparative analysis of MCDM methods for the assessment of mortality in patients with acute coronary syndrome. Artif. Intell. Rev. 48(4), 557–571 (2017) 17. Sałabun, W., Wa˛tróbski, J., Piegat, A.: Identification of a multi-criteria model of location assessment for renewable energy sources. In: International Conference on Artificial Intelligence and Soft Computing (pp. 321–332). Springer, Cham (2016) 18. Sałabun, W., Ziemba, P., Wa˛tróbski, J. The rank reversals paradox in management decisions: The comparison of the ahp and comet methods. In: International Conference on Intelligent Decision Technologies, pp. 181–191. Springer, Cham (2016) 19. Schneider, J., & Kirkpatrick, S.: Stochastic Optimization. Springer Science & Business Media (2007) 20. Sharafi, Y., Khanesar, M. A., Teshnehlab, M.: Discrete binary cat swarm optimization algorithm. In: 2013 3rd IEEE International Conference on Computer, Control and Communication (IC4), pp. 1–6. IEEE (2013) 21. Shekhovtsov, A., Kołodziejczyk, J., Sałabun, W.: Fuzzy model identification using monolithic and structured approaches in decision problems with partially incomplete data. Symmetry 12(9), 1541 (2020) 22. Qin, X.S., Huang, G.H., Sun, W., Chakma, A.: Optimization of remediation operations at petroleum-contaminated sites through a simulation-based stochastic-MCDA approach. Energy Sourc. Part A 30(14–15), 1300–1326 (2008) 23. Wa˛tróbski, J., Jankowski, J., Ziemba, P., Karczmarczyk, A., Zioło, M.: Generalised framework for multi-criteria method selection. Omega 86, 107–124 (2019) 24. Wa˛tróbski, J., ałabun, W. Green supplier selection framework based on multi-criteria decisionanalysis approach. In: International Conference on Sustainable Design and Manufacturing, pp. 361–371. Springer, Cham (2016) 25. Wa˛tróbski, J., Sałabun, W., Karczmarczyk, A., Wolski, W.: Sustainable decision-making using the COMET method: An empirical study of the ammonium nitrate transport management. In: 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 949–958. IEEE (2017) 26. Wie˛ckowski, J., Kizielewicz, B., Kołodziejczyk, J.: The search of the optimal preference values of the characteristic objects by using particle swarm optimization in the uncertain environment. In: International Conference on Intelligent Decision Technologies, pp. 353–363. Springer, Singapore (2020) 27. Wie˛ckowski, J., Kizielewicz, B., Kołodziejczyk, J.: Application of hill climbing algorithm in determining the characteristic objects preferences based on the reference set of alternatives. In: International Conference on Intelligent Decision Technologies, pp. 341–351. Springer, Singapore (2020) 28. Wie˛ckowski, J., Kizielewicz, B., Kołodziejczyk, J. Finding an Approximate Global Optimum of Characteristic Objects Preferences by Using Simulated Annealing. In International Conference on Intelligent Decision Technologies, pp. 365–375. Springer, Singapore (2020)
A Study of Different Distance Metrics in the TOPSIS Method Bartłomiej Kizielewicz, Jakub Wie˛ckowski, and Jarosław Wa˛trobski
Abstract To improve the decision-making process, more and more systems are being developed based on a group of multi-criteria decision analysis (MCDA) methods. Each method is based on different approaches leading to a final result. It is possible to modify the default performance of these methods, but in this case, it is worth checking whether it affects the achieved results. In this paper, the technique for order preference by similarity to an ideal solution (TOPSIS) method was used to examine the chosen distance metric’s influence to obtained results. The Euclidean and Manhattan distances were compared, while obtained rankings were compared with the similarity coefficients to check their correlation. It shows that used distance metric has an impact on the results and they are significantly different. Keywords TOPSIS · MCDA · Distance metrics
1 Introduction Decision making arises in many problems. The reliable decision is usually influenced by numerous criteria that determine the attractiveness of an option [5, 16]. During the decision-making process, different approaches can be used to obtain a solution [14, 21]. However, these methods use other computational techniques to indicate the optimal choice [6, 26]. The various approaches to obtaining preference values for the considered set of alternatives make it difficult to determine which method to choose for solving a given B. Kizielewicz · J. Wie˛ckowski Research Team on Intelligent Decision Support Systems, Department of Artificial Intelligence and Applied Mathematics, Faculty of Computer Science and Information Technology, West ˙ Pomeranian University of Technology in Szczecin, ul. Zołnierska 49, Szczecin 71-210, Poland J. Wa˛trobski (B) ˙ Jana Pawła II 22A, Szczecin Institute of Management, University of Szczecin, Aleja PapieZa 71-210, Poland e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_23
275
276
B. Kizielewicz et al.
problem [9, 17]. Besides, in the case of selected methods for solving multi-criteria problems, it is possible to replace the elements used by the authors by others, which will also lead to a correct solution [4, 25]. However, in the described case, it should be determined whether the introduced modification will affect the obtained results [20]. Multi-criteria decision analysis (MCDA) methods belong to a group of methods for solving complex decision-making problems [19]. The primary assumption is evaluating the collected set of alternatives in terms of a specific group of criteriabreak [22, 30]. By performing calculations on the decision matrix, the methods guarantee to obtain preference values of the alternatives used to obtain a positional ranking for the analyzed set. The effectiveness of these methods has been examined many times, and they have been used to solve supplier evaluation problems [23, 29], selection of industrial locations [8, 10], in sports [28] or nutrition [27]. One of the methods belonging to MCDA methods is the technique for order preference by similarity to an ideal solution (TOPSIS) method [1]. It uses a distancebased approach to calculate preference values. The default distance method used in solving the problem is the Euclidean distance. However, other methods, such as Manhattan distance, can also be used during the calculation of preference values. In this paper, the TOPSIS method was used to calculate preference values for the considered set of ten alternatives. Two approaches were used to calculate the preference values, where one of them included the use of the Euclidean metric used in the final stage of the method, while the other used the Manhattan metric. Examinations were carried out for different numbers of criteria to investigate the divergence in the results obtained for decision matrices of different sizes and the varying influence of the criteria on the alternatives’ attractiveness. Moreover, the obtained rankings were compared using the W S similarity coefficient and the weighted Spearman correlation coefficient to examine their similarity [18]. The rest of the paper is organized as follows. In Sect. 2, the introduction to the TOPSIS method is presented. Section 3 contains the distance metrics’ main assumptions, where the Euclidean and Manhattan distances are shown. Section 4 includes the study case, where the comparison of obtained results using two different distances is made. Finally, in Sect. 5, the summary of the results and conclusions from the research are presented.
2 TOPSIS Method Chen and Hwang developed the technique for order preference by similarity to an ideal solution (TOPSIS) method in 1992 [13, 15]. Their main concept was to solve the multi-criteria problem using the distance to the ideal solution to receive the preferences for the set of alternatives [1]. Subsequent steps require calculating the positive ideal solution (PIS) and negative ideal solution (NIS), which are used to determine the final preferences [24].
A Study of Different Distance Metrics …
277
Step 1 To achieve the method’s proper performance, the weights vector for defined criteria should be determined, and the decision matrix should be normalized at the beginning (1). f ri j = Ji j 2 j=1 f i j (1) j = 1, . . . , J ; i = 1, . . . , n Step 2 Next step is to calculate the weighted normalized decision matrix with the following Formula (2): vi j = wi · ri j ,
j = 1, . . . , J ; i = 1, . . . , n
(2)
Step 3 Positive and negative ideal solutions for a defined decision-making problem should also be identified (3): A∗ = v1∗ , . . . , vn∗ = max j vi j |i ∈ I P , min j vi j |i ∈ I C A− = v1− , . . . , vn− = min j vi j |i ∈ I P , max j vi j |i ∈ I C
(3)
where I C stands for cost type criteria and I P for profit type. Step 4 Negative and positive distance from an ideal solution should be calculated using the n-dimensional Euclidean distance. To apply such calculations, formula presented below should be used (4):
2 n vi j − vi∗ , j = 1, . . . , J i=1 n − 2 , j = 1, . . . , J D −j = i=1 vi j − vi D ∗j =
(4)
Step 5 The last step is to calculate the relative closeness to the ideal solution (5): C ∗j =
D −j D ∗j +D −j
,
j = 1, . . . , J
(5)
3 Distance Metrics Distance methods are used to calculate the relative distance between selected points. This distance is described by the length of a path connecting these points [2]. Different methods can be used to determine this value, based on different calculations concerning each other. Two metrics, Euclidean distance and Manhattan distance, are presented below.
278
B. Kizielewicz et al.
3.1 Euclidean Distance Euclidean distance uses the root of the squares’ root of the differences of the points to calculate the distance between the points [3, 7]. The formula describing this relationship is shown below (6). It is used by default during the final step of the TOPSIS method.
n (6) d(a, b) = (ai − bi )2 i=1
3.2 Manhattan Distance Manhattan distance to calculate the distance between two points uses the sum of the difference from the absolute values between those points [11, 12]. This metric’s calculation is done using the following Formula (7). It is an alternative proposal to Euclidean distance and can be used in the final step of the TOPSIS method. d(a, b) =
n
|ai − bi |
(7)
i=1
4 Study Case In this paper, the TOPSIS method was used to calculate the preferences for ten alternatives with three different cases. The 2, 4, and 8 criteria were taken into consideration during the evaluation of determined set. To create the decision matrix, the values were generated randomly and placed in range ∈ [0, 1]. Moreover, the profit type of criteria was selected for every criterion. In each case, the two methods were used to calculate the distance between alternatives and the ideal solution. Euclidean distance is used by default in the TOPSIS method’s performance, and it was decided to use the additional method, namely Manhattan distance, to examine if the provided change will affect the obtained results. Moreover, the obtained rankings were then compared with W S similarity coefficient and weighted Spearman correlation coefficient to check the similarity of the received results. The decision matrix for ten alternatives and two criteria is presented in Table 1. Based on the presented matrix, the calculations were applied for both distance methods, and obtained results are presented in Table 2. For each alternative, the preferential value and position in the general ranking are shown for the Manhattan and Euclidean distances, respectively. The rankings visualization is presented in Fig. 1. Comparison of obtained results with similarity coefficients indicates that they are similar to some extent. However, it can be seen that the replacement by the Manhattan method
A Study of Different Distance Metrics …
279
Table 1 Set of ten alternatives and two criteria for the study of distance metrics in the TOPSIS method Ai C1 C2 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
0.5663 0.2402 0.2190 0.0511 0.0594 0.6753 0.3277 0.0594 0.8553 0.5507
0.0926 0.4782 0.5621 0.6050 0.0990 0.9571 0.3809 0.5891 0.9325 0.0939
Table 2 Rankings and preferences for two criteria at different distance metrics in the TOPSIS method Ai PMAN RANKMAN PEUC RANKEUC A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
0.3203 0.3405 0.3759 0.2963 0.0088 0.8880 0.3387 0.2923 0.9857 0.3114
Fig. 1 Visualization the rankings of the alternatives for two criteria with different distance metrics using the TOPSIS method
5 7 8 3 1 9 6 2 10 4
0.3761 0.3480 0.3890 0.3543 0.0089 0.8497 0.3387 0.3477 0.9799 0.3678
7 4 8 5 1 9 2 3 10 6
280
B. Kizielewicz et al.
Table 3 Set of ten alternatives and four criteria for the study of distance metrics in the TOPSIS method Ai C1 C2 C3 C4 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
0.7473 0.1234 0.2115 0.1483 0.6735 0.0050 0.9758 0.8253 0.4345 0.5418
0.5599 0.2073 0.3017 0.7255 0.1346 0.4317 0.5349 0.4703 0.1323 0.4522
0.6409 0.0521 0.3227 0.0275 0.3975 0.2508 0.6283 0.8973 0.5703 0.7490
0.5304 0.7842 0.3942 0.0662 0.0605 0.9377 0.4664 0.0312 0.1242 0.3510
Table 4 Rankings and preferences for four criteria at different distance metrics in the TOPSIS method Ai PMAN RANKMAN PEUC RANKEUC A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
0.6853 0.2768 0.3095 0.2965 0.2875 0.4403 0.7123 0.6037 0.2922 0.5686
9 1 5 4 2 6 10 8 3 7
0.6799 0.3491 0.3136 0.3832 0.3454 0.4615 0.6825 0.5649 0.3393 0.5614
9 4 1 5 3 6 10 8 2 7
provides different rankings that in the case when the Euclidean distance was used. For the W S similarity coefficient, the obtained correlation equals 0.8045, when for the weighted Spearman correlation coefficient, it was 0.7333. Table 3 presents the decision matrix for considered problem with ten alternatives and four criteria. Table 4 includes the obtained preferences for the set of alternatives for the Manhattan distance, positional ranking for the Manhattan distance, preferences for the alternatives when using the Euclidean distance, and the positional ranking for the Euclidean distance. Similarity coefficients were used to examine the correlation in this case, and the results show that the W S similarity coefficient (0.6912) guarantees less similar results than the weighted Spearman correlation coefficient (0.7488). The visualization of the received rankings is shown in Fig. 2. The decision matrix for the case when considering the ten alternatives and eight criteria is presented in Table 5. Obtained results are presented in Table 6, where
A Study of Different Distance Metrics …
281
Fig. 2 Visualization the rankings of the alternatives for four criteria with different distance metrics using the TOPSIS method
Table 5 Set of ten alternatives and eight criteria for the study of distance metrics in the TOPSIS method Ai C1 C2 C3 C4 C5 C6 C7 C8 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
0.7774 0.7947 0.6028 0.0730 0.2478 0.3040 0.8740 0.8077 0.5492 0.3945
0.2838 0.4297 0.7945 0.5762 0.9830 0.0804 0.0447 0.9979 0.4825 0.8256
0.1776 0.5748 0.0383 0.0379 0.9237 0.3517 0.3524 0.9534 0.1430 0.7249
0.5252 0.3415 0.3126 0.8694 0.0293 0.0931 0.2109 0.2889 0.3525 0.0589
0.1674 0.0579 0.5417 0.1389 0.8776 0.1321 0.1360 0.5945 0.4103 0.7521
0.2747 0.1380 0.5882 0.2333 0.0220 0.7537 0.5388 0.4951 0.2726 0.0658
0.0258 0.4390 0.0455 0.7555 0.0520 0.7350 0.2257 0.3876 0.1204 0.5076
0.6678 0.1465 0.8541 0.1600 0.1845 0.3088 0.8701 0.5346 0.6129 0.5819
Table 6 Rankings and preferences for eight criteria at different distance metrics in the TOPSIS method Ai PMAN RANKMAN PEUC RANKEUC A1 A2 A3 A4 A5 A6 A7 A8 A9 A10
0.3840 0.3734 0.5193 0.3704 0.4072 0.3789 0.4544 0.6949 0.3874 0.5219
4 2 8 1 6 3 7 10 5 9
0.4145 0.4070 0.5132 0.4229 0.4493 0.4219 0.4706 0.6550 0.4009 0.5160
3 2 8 5 6 4 7 10 1 9
282
B. Kizielewicz et al.
Fig. 3 Visualization the rankings of the alternatives for eight criteria with different distance metrics using the TOPSIS method
the preferences and positional ranking are included for the Manhattan metric and Euclidean metric, respectively. The visualization of the results is shown in Fig. 3. Using the correlation coefficients for the particular case, the higher similarity of compared rankings was obtained with the W S similarity coefficient than the one obtained with the weighted Spearman correlation coefficient, where the values equal 0.7245 and 0.7014, respectively.
5 Conclusions The existence of many methods from the group of MCDA methods causes new approaches to be created in solving multi-criteria problems. Each of them is based on different transformations leading to the final result. However, there are possibilities to modify the initial assumptions in the operation of some methods, which also allow obtaining the correct result, but it may differ compared to basic assumptions. To determine whether modifications to the method’s basic assumptions would produce different results, it was decided to use the TOPSIS method, which uses Euclidean distance to calculate preference values. To check the influence of the method used to calculate the distance, Manhattan distance was chosen, and the research was conducted for three sizes of the decision matrix, considering ten alternatives and two, four, and eight criteria, respectively. The results obtained were compared using selected similarity coefficients. It showed that the rankings in the analyzed cases differ significantly, making it difficult to determine which metric should be used in the operation of the TOPSIS method. For future directions, it is worth considering determining which metric should be applied when using the TOPSIS method. Despite using two selected distance metrics and receiving the comparative results, it is difficult to determine which metric is more
A Study of Different Distance Metrics …
283
a suitable application. It is also worth noting the possible use of other distance metrics to calculate preference values. Acknowledgements The work was supported by the project financed within the framework of the program of the Minister of Science and Higher Education under the name “Regional Excellence Initiative” in the years 2019–2022, Project Number 001/RID/2018/19;the amount of financing: PLN 10.684.000,00 (J.W.).
References 1. Behzadian, M., Otaghsara, S.K., Yazdani, M., Ignatius, J.: A state-of the-art survey of TOPSIS applications. Expert Syst. Appl. 39(17), 13051–13069 (2012) 2. Chiu, W.Y., Yen, G.G., Juan, T.K.: Minimum Manhattan distance approach to multiple criteria decision making in multiobjective optimization problems. IEEE Trans. Evol. Comput. 20(6), 972–985 (2016) 3. Danielsson, P.E.: Euclidean distance mapping. Comput. Graph. Image Process. 14(3), 227–248 (1980) 4. Das, B., Pal, S.C.: Assessment of groundwater vulnerability to over-exploitation using MCDA, AHP, fuzzy logic and novel ensemble models: a case study of Goghat-I and II blocks of West Bengal, India. Environ. Earth Sci. 79(5), 1–16 5. De Montis, A., De Toro, P., Droste-Franke, B., Omann, I., Stagl, S.: Criteria for quality assessment of MCDA methods. In: 3rd Biennial Conference of the European Society for Ecological Economics, Vienna, pp. 3–6 (2000) 6. Dehe, B., Bamford, D.: Development, test and comparison of two multiple criteria decision analysis (MCDA) models: a case of healthcare infrastructure location. Expert Syst. Appl. 42(19), 6717–6727 (2015) 7. Fabbri, R., Costa, L.D.F., Torelli, J.C., Bruno, O.M.: 2D Euclidean distance transform algorithms: a comparative survey. ACM Comput. Surv. (CSUR) 40(1), 1–44 (2008) 8. Gbanie, S.P., Tengbe, P.B., Momoh, J.S., Medo, J., Kabba.: Modelling landfill location using geographic information systems (GIS) and multi-criteria decision analysis (MCDA): case study Bo, Southern Sierra Leone. Appl. Geogr. 36, 3–12. V. T. S (2013) 9. Guitouni, A., Martel, J.M.: Tentative guidelines to help choosing an appropriate MCDA method. Eur. J. Oper. Res. 109(2), 501–521 (1998) 10. Harper, M., Anderson, B., James, P., Bahaj, A.: Assessing socially acceptable locations for onshore wind energy using a GIS-MCDA approach. Int. J. Low-Carbon Technol. 14(2), 160– 169 (2019) 11. Hyde, K.M., Maier, H.R.: Distance-based and stochastic uncertainty analysis for multi-criteria decision analysis in excel using visual basic for applications. Environ. Modell. Softw. 21(12), 1695–1710 (2006) 12. Lavoie, T., Merlo, E.: An accurate estimation of the Levenshtein distance using metric trees and Manhattan distance. In: 2012 6th International Workshop on Software Clones (IWSC), pp. 1–7. IEEE (2012) 13. Mairiza, D., Zowghi, D., Gervasi, V.: Utilizing TOPSIS: a multi criteria decision analysis technique for non-functional requirements conflicts. In: Requirements Engineering, pp. 31–44. Springer, Berlin, Heidelberg (2014) 14. Nutt, D.J., Phillips, L.D., Balfour, D., Curran, H.V., Dockrell, M., Foulds, J., Sweanor, D.: Estimating the harms of nicotine-containing products using the MCDA approach. Eur. Add. Res. 20(5), 218–225 (2014) 15. Opricovic, S., Tzeng, G.H.: Compromise solution by MCDM methods: a comparative analysis of VIKOR and TOPSIS. Eur. J. Oper. Res. 156(2), 445–455 (2004)
284
B. Kizielewicz et al.
16. Podinovski, V.V.: The quantitative importance of criteria for MCDA. J. Multi-Criteria Decis. Anal. 11(1), 1–15 (2002) 17. Podvezko, V.: The comparative analysis of MCDA methods SAW and COPRAS. Eng. Econ. 22(2), 134–146 (2011) 18. Sałabun, W., Urbaniak, K.: A new coefficient of rankings similarity in decision-making problems. In: International Conference on Computational Science, pp. 632–645. Springer, Cham (2020) 19. Sałabun, W., Wa˛trobski, J., Shekhovtsov, A.: Are MCDA methods benchmarkable? A comparative study of TOPSIS, VIKOR, COPRAS, and PROMETHEE II methods. Symmetry 12(9), 1549 (2020) 20. Shekhovtsov, A., Kołodziejczyk, J.: Do distance-based multi-criteria decision analysis methods create similar rankings? Procedia Comput. Sci. 176, 3718–3729 (2020) 21. Shekhovtsov, A., Kołodziejczyk, J., Sałabun, W.: Fuzzy model identification using monolithic and structured approaches in decision problems with partially incomplete data. Symmetry 12(9), 1541 (2020) 22. Shekhovtsov, A., Sałabun, W.: A comparative case study of the VIKOR and TOPSIS rankings similarity. Procedia Comput. Sci. 176, 3730–3740 (2020) 23. Shekhovtsov, A., Kozlov, V., Nosov, V., Sałabun, W.: Efficiency of methods for determining the relevance of criteria in sustainable transport problems: a comparative case study. Sustainability 12(19), 7915 (2020) 24. Shih, H.S., Shyur, H.J., Lee, E.S.: An extension of TOPSIS for group decision making. Math. Comput. Model. 45(7–8), 801–813 (2007) 25. Stewart, T.J.: Dealing with uncertainties in MCDA. In: Multiple Criteria Decision Analysis: state of the Art Surveys, pp. 445–466. Springer, New York (2005) 26. Thokala, P., Duenas, A.: Multiple criteria decision analysis for health technology assessment. Value Health 15(8), 1172–1181 (2012) 27. Toledo, R.Y., Alzahrani, A.A., Martínez, L.: A food recommender system considering nutritional information and user preferences. IEEE Access 7, 96695–96711 (2019) 28. Urbaniak, K., Wa˛trobski, J., Salabun„ W.: Identification of players ranking in e-sport. Appl. Sci. 10(19), 6768 (2020) 29. Wa˛trobski, J., Sałabun, W.: Green supplier selection framework based on multi-criteria decision-analysis approach. In: International Conference on Sustainable Design and Manufacturing, pp. 361–371. Springer, Cham (2016) 30. Wa˛trobski, J., Jankowski, J.: Knowledge management in MCDA domain. In: 2015 Federated Conference on Computer Science and Information Systems (FedCSIS) (pp. 1445–1450). IEEE (2015)
Assessment and Improvement of Intelligent Technology in Architectural Design Satisfactory Development Advantages Management Vivien Yi-Chun Chen, Jerry Chao-Lee Lin, Zheng Wu, Hui-Pain Lien, Pei-Feng Yang, and Gwo-Hshiung Tzeng Abstract In the Internet era, the competitiveness of global building design smart technologies continues to increase, inevitably forcing smart building design to reexamine the application of smart systems, the development and evaluation of indicators, including the evaluation of biometric systems, radio frequency technology systems and digital cryptography. The system is used to improve the overall satisfaction and competitiveness of intelligent technology in architectural design. However, the successful design of intelligent architectural depends on the development of corresponding systems and the management of advantages. Thus, we via literature review to frame construction developing sustainability strategy indicators for this research that were divided into three major dimensioned which were further subdivided into twelve sub criteria; the propose to use a new sustainable development strategy indicator evaluation and intelligent technology architectural design improvement MADM-DANP model. Unlike previous multiple attributes, decision-making (MADM) methods that assume the criteria are independent, we propose a hybrid model, combining a decision-making trial and evaluation laboratory (DEMATEL) and analytical network process (ANP) method called DANP, which addresses the dependent relationships between the various criteria to better reflect the real-world situation. In the final results, we can also address a gap in the development of sustainable development plans for the environment, taking into account comfort, convenience and safety in order to raise the standards in achieving human welfare expectations.
V. Y.-C. Chen (B) · Z. Wu · P.-F. Yang Department of Architecture, Fujian University of Technology, Fuzhou, China J. C.-L. Lin Department of Information Engineering, Feng Chia University, Taichung, Taiwan V. Y.-C. Chen · H.-P. Lien Department of Water Resources Engineering and Conservation, Feng Chia University, Taichung, Taiwan G.-H. Tzeng Graduate Institute of Urban Planning, National Taipei University, New Taipei City, Taiwan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_24
285
286
V. Y.-C. Chen et al.
Keywords Intelligent architectural design sustainability technology indicators · MADM-DANP model · DEMATE based on ANP “DANP” · Influential network relationship map (INRM) · Biometric recognition and wireless communication system · Cryptography system; aspiration level · Performance
1 Introduction In the past research, many different buildings have been labeled as smart. However, the application of intelligence in buildings has not yet played its true potential and the effect of intelligent system integration. The research direction of smart architecture that everyone is discussing is the application of smart construction (building), and there is no relevant research on the systematic integration of information technology indicators in the smart architecture design. This study is a cross—information technology, architectural design and management research, and this research is a crossinformation technology, smart buildings and management field research, as shown in Fig. 1, in order to formulate and improve the design indicators of the overall structure in the intelligent system success, corresponding to the sustainable development of intelligent technology and excellent management. So, we must effectively integrate and apply intelligent systems and give full play to them, in order to cope with the rapid development of technological innovations 5G networks around the world and the rise of artificial-led intelligent analysis (AI), to successfully integrate autonomous control and IoT together. It is inevitable to force the intelligent technology in architectural design to re-examine the design and application of the intelligent system, in order to improve architectural design in the three intelligent systems: (1) biometric recognition system, (2) wireless communication system and (3) cryptography system in terms of sustainable development strategy index development and follow-up evaluation to improve the competitiveness of intelligent technology and the corresponding system development technology and Fig. 1 Smart architecture design concept of evaluation framework
Assessment and Improvement of Intelligent Technology …
287
Table 1 Dimensions and criteria for intelligent technology indicators systems
Making improvement strategies for continuously creating the sustainability
Dimensions (three major systems)
Criteria (twelve sub-standards)
Biometric recognition system ( D1 )
1. Fingerprint recognition (C 1 ) 2. Face recognition (C 2 ) 3. Speech recognition (C 3 ) 4. Iris recognition (C 4 ) 5. Finger vein recognition (C 5 )
Wireless communication system ( D2 )
6. Bluetooth (C 6 ) 7. Wireless fidelity (Wi-Fi) (C 7 ) 8. Radio frequency identification (RFID) (C 8 ) 9. ZigBee (C 9 ) 10. Infrared (C 10 )
Cryptography system ( D3 )
11. Six Bit (C 11 ) 12. Eight bit (C 12 )
advantage management, to improve the overall satisfaction, safety and comfort of the quality of life is the purpose of this study, as shown in Table 1. Therefore, through literature review, we construct the system index of sustainable development strategy in this study, which is divided into three major systems: biometric recognition system, wireless communication system and cryptography system. These systems are subdivided into twelve sub-standards, as shown in Tables 1, 2 and 3. We use a hybrid multi-attribute decision-making (MADM) approach to solve the dependencies among the design attributes of intelligent architecture, and group of experts’ measure and evaluate three dimensions and twelve criteria to solve the problems of intelligent architecture design system. Unlike previous multi-attribute decision-making (MADM) models, which assumed that the criteria were independent, the intelligent architecture design combines the decisionmaking trial and evaluation laboratory (DEMATEL) and the analytic network process (ANP) [1] approach called DANP.
2 Literature Review What is a smart architecture design? It is a subject that combines intelligent information system, smart building and management, as shown in Fig. 1. In the past research, the smart building is mostly a smart construction system. It includes building and automation, energy management and information flow platform control, in order to improve the efficiency of automation control and reduce the construction cost and daily expenses, so that managers relaxed, users feel comfortable, happy and security
288
V. Y.-C. Chen et al.
[2–5]. Although many new buildings were called “smart” buildings in the past, their building intelligence levels vary significantly with the functions and operational efficiency of the installed smart components [6, 7]. Occasionally, people criticize that smart buildings cannot flexibly respond to the needs of end users. Failure to meet the expectations of customers or end users may exacerbate the disconnect between the expectations and realization of smart buildings [8]. With the emergence of countless smart building components or products on the market, the decision to choose between them has become crucial in the configuration of alternative building solutions. This leads decision makers into the “dilemma” of choice [9]. Therefore, despite the growing popularity of implementing smart technologies in new buildings, the real challenge is to design and configure optimal intelligent building systems so that they can intelligently respond to the changing needs of end users and achieve developer goals [8]. So intelligent building design is very important, which is also the final goal of this research. The most important consideration of smart architecture design is the technology of intelligent information system. Overall, the practical application and literature survey in intelligent information system can be divided into three commonly used systems and 12 system indicators: (1) fingerprint recognition, (2) face recognition, (3) speech recognition, (4) iris recognition, (5) finger vein identification, (6) Bluetooth, (7) wireless fidelity (Wi-Fi), (8) radio frequency identification (RFID), (9) ZigBee, (10) infrared, (11) six bit and (12) eight bit.
3 Constructing the New Hybrid Modified MADM Model for Smart Architecture Design Satisfactory Development The evaluation framework of this study constructed a questionnaire for qualitative and quantitative measurement of intelligent information technology systems to establish criteria/attribute planning indicators. It is evaluated by a hybrid multi-attribute decision-making (MADM) method. In the second stage, based on the basic concept of ANP [1], the influence relationship matrix of DEMATEL technology can also be used to determine the influence weight of DANP (DEMATEL based on ANP) through unweighted super-weighted matrix and weighted super matrix in matrix transformation. The analysis can be performed efficiently and satisfactorily. The performance gaps of criteria can be integrated from each criterion into dimensions and overall performance gap to make the best improvement strategies for creating the continuous sustainability development strategies toward reaching the aspiration level based on INRM by systematics in the best smart architecture design.
Assessment and Improvement of Intelligent Technology …
289
3.1 DEMATEL Technique for Building a Network Relation Map (INRM) The DEMATEL technique is used to investigate complicated problem groups. DEMATEL has been successfully applied to many situations, including the development of marketing strategies and the evaluation of e-learning methods, control systems and safety problems [10, 11]. The method can be summarized as follows: Step 1: Calculate the direct-influence matrix by scores. Based on experts’ opinions, evaluations are made of the relationships among elements (or variables/attributes) of mutual influence. Using a scale ranking from 0 to 4, with scores representing “no influence (0),” “low influence (1),” “medium influence (2),” “high influence (3)” and “very high influence (4),” respectively. Step 2: Normalizing the direct-influence matrix. Based on the direct-influence of matrix A, the normalized direct-relation matrix D is acquired by using formulas (1). D = kA
where k = min
⎧ ⎨ ⎩
1/ max i
n j=1
aij , 1/ max j
(1) n i=1
⎫ ⎬ aij , i, j ∈ {1, 2, . . . , n} ⎭
(2)
Step 3: Attaining the total-influence matrix T. Once the normalized directinfluence matrix D is obtained, the total-influence matrix T of NRM can be obtained through formula (3), in which I denotes the identity matrix. T = D + D2 + D3 + . . . + Dk = D(I + D + D2 + . . . + Dk−1 ) = D(I − Dk )(I − D)−1 Then, T = D(I − D)−1 , when k → ∞, Dk = [0]n×n
(3)
n n where D = [dij ]n×n , 0 ≤ dij < 1, 0 < j=1 dij , i=1 dij ≤ 1. If at least one row or column of summation, but not all, is equal to 1, then limk→∞ Dk = [0]n×n [0]n×n [0]n×n . Step 4: Analyzing the results: In the stage, the sum rows respectively and the sum columns are expressed as vector r and vector c by using formulas (4), (5), and (6). Then, the horizontal axis vector (r + c) is made by adding r to c, which exhibits the importance of the criterion. Similarly, the vertical axis (r–c) is made by deducting r from c, which may separate the criteria into a cause group and an affected group. In general, when (r–c) is positive, the criterion will be part of the cause group. On the
290
V. Y.-C. Chen et al.
contrary, if the (r–c) is negative, the criterion is part of the affected group. Therefore, the causal graph can be achieved by mapping the dataset of the (r + c, r–c), providing a valuable approaching for making decisions. T = [tij ]n×n , i, j = 1, 2, . . . , n ⎤ ⎡ n tij ⎦ r=⎣ j=1
c=
n i=1
(4)
= [ti· ]n×1
(5)
= [t· j ]n×1
(6)
n×1
t tij
1×n
where vector r and vector c express the sum of rows and the sum of columns from total-influence matrix T = [tij ]n×n of respectively, and superscript denotes transpose.
3.2 Finding the Influential Weights Using DANP The ANP can be used as an MADM method to systematically select the most suitable locations for each facility. The ANP is a relatively simple and systematic approach that can be used by decision makers. MADM techniques have also been widely used to select facility sites. The ANP is an extension of AHP by [1] that helps to overcome the problem of interdependence and feedback among criteria and alternatives. The first step of the ANP is to use the criteria of the entire system to form a supermatrix through pair-wise comparisons. The general form of the supermatrix is seen in Eq. (7). For example, if the system is structured as an unweighted supermatrix T, then the local priorities derived from the pair-wise comparisons throughout the network are contained as in Eq. (8).
Assessment and Improvement of Intelligent Technology …
291
C1 C2 · · · Cm e11 · · · e1n 1 e21 · · · e2n 2 · · · em1 · · · emnm e12 ⎡ .. W 11 · · · W 12 · · · W 1m . ⎢ e1n 1 ⎢ ⎢ ⎢ C1 e 21 ⎢ e22 ⎢ ⎢ ⎢ W 21 · · · W 22 · · · W 2m .. ⎢ . ⎢ W = C2 ⎢ e2n 2 ⎢ ⎢ .. .. ⎢ .. .. ... ⎢. . . . ⎢ ⎢ .. .. .. e m1 Cm ⎢. . . ⎢ em2 ⎣ .. . ··· W ··· W W m1
m2
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
(7)
mm
emnm C1 C2 C3 ⎡ ⎤ C1 0 0 W 13 W= C2 ⎣ W 21 0 0 ⎦ C3 0 W 32 W 33
(8)
where W 21 is a matrix that represents the weights of cluster 2 in respect to cluster 1, matrix W 32 represents the weights of cluster 3 with respect to cluster 2 and matrix W 13 represents the weights of cluster 1 with respect to cluster 3. In addition, matrix W 33 denotes the inner dependence and feedback within cluster 3. The weighted supermatrix is derived by setting the “all columns sum” to unity by normalization. This step is similar to the use the Markov chain to ensure that the sum of the probabilities of all states equals 1. The weighted supermatrix can then be raised to limiting powers, as in Eq. (9), to calculate the overall priorities of weights. lim W k
k→∞
(9)
3.3 Modified MADM Method for Performance Gap Assessment The for reducing the performance gap gkj = method was developed modified MADM aspied aspied woest − xkj / x j − x j |k = 1, 2, . . . , K ; j = 1, 2, . . . , n} of x j each criterion j with alternative k in complex interrelationship systems. It introduces
292
V. Y.-C. Chen et al.
the multiple attribute ranking index based on the particular measure performance of i w Dj i gkjDi , and gap, how to improve the each criterion of gkj , the dimension gkDi = mj=1 m m i overall gk = i=1 j=1 w Dj i gkjDi = nj=1 w j gkj based on INRM for “closeness” to the zero (i.e., toward reaching the aspiration level).
4 Results The new era of Internet technology (5G) is to meet human needs for enjoying life with truly smart buildings. Therefore, although the implementation of intelligent technology in new buildings is becoming more and more popular, the real challenge is to design and configure the best smart building systems so that they can intelligently respond to the changing needs of end users and achieve development personnel goals. The most important aspect of smart architecture design is the evaluation and improvement of intelligent information system technology. In order to reflect the interrelationship of these factors in the real world, this article reviews the practical application and relevant literature to determine more criteria, which can constitute a quantitative measurement of the three-dimensional index evaluation model (see Table 1).
4.1 Building an Influence Relationship Matrix of an INRM for Regional Shopping Center Development Analysis The aim of this approach is not only to determine the most important policy criteria but also to measure relationships among criteria to build an INRM. A questionnaire was used to gather these assessments from group of academic scholars, government officers and industrial experts. They rated each criterion with respect to efficient development on a 5-point scale ranging from 0 (no effect) to 4 (extremely important influence). The experts were asked to determine the importance of the relationships among the dimensions. The average initial direct-influence 4 × 4 matrix A obtained by pair-wise comparisons of influences and directions between dimensions are shown in Table 2. Table 2 Initial influence matrix A D for dimensions Dimensions
D1
D2
D3
Biometric recognition system (D1 ) Wireless communication system (D2 )
–
2.80
1.60
3.00
–
Cryptography system (D3 )
1.40
2.60
2.40
–
Assessment and Improvement of Intelligent Technology …
293
Table 3 Sum of influences given and received for dimensions T D Dimensions
ri
ci
ri + ci
ri − ci
D1
6.733
7.953
14.686
−1.221 0.132
D2
7.709
7.577
15.286
D3
7.323
6.234
13.557
As matrix A shows the normalized direct influence D is calculated from Eqs. (1) and (2). Equation (3) is then used to derive the total influence T D as seen in Table 3. By using Eqs. (5) and (6), the sum of the total influence given and received by each dimension can be derived (see Table 3). Finally, the DEMATEL method can be used to draw the impact-relations-map (INRM).
4.2 Weightings of Criteria in Intelligent Informant System for Smart Architecture Design With an appropriate assessment system as the goal, pair-wise comparisons of the determinants were calculated based on the total relationship matrix as deduced by DEMATEL. Note the interrelationships between the goals and the directions of arrows, indicating the dimensions of various assessment system determinants (Fig. 2). The total relationship matrix serves as a set of inputs for ANP. Weights corresponding to each determinant (Table 4) are derived accordingly. The objective of the smart architecture design is to elevate living standards, security and satisfactorily sustainable development; this is the same goal of the impact direction map determined by DEMATEL (Fig. 2). The integration of intelligent informant system for smart architecture design performance index scores into the ANP shows that biometric recognition system receives the highest score of 5.95, and the gap score is 4.05 away from the goal or the aspire level (Table 5). The criteria of six bit receives the highest score of 6.25, and the gap score is 3.75 away from the goal or the aspire level, as shown in Table 5. In terms of technology, 8 digits (C 11 ), 6 digits Fig. 2 Impact direction map of regional shopping center development
8,500 Gap: 4.05, (D ) 6,733 , 1
Gap: 4.45, (D2) 7,709 , 7,577 Wireless communication
8,000 7,953 7,500 7,000 6,500 6,000
Gap: 4.25, (D3) 7,323 , 6,234 Cryptography system
5,500 5,000 6,500
6,700
6,900
7,100
7,300
7,500
7,700
7,900
294
V. Y.-C. Chen et al.
Table 4 Influential weighted in stable matrix of DANP when power limit limϕ→∞ (Wα )ϕ D1 C1
D2 C2
C3
C4
C5
C6
D3 C7
C8
C9
C 10
C 11
C 12
C1
0.079 0.079 0.079 0.079 0.079 0.079 0.079 0.079 0.079 0.079 0.079 0.079
C2
0.082 0.082 0.082 0.082 0.082 0.082 0.082 0.082 0.082 0.082 0.082 0.082
C3
0.073 0.073 0.073 0.073 0.073 0.073 0.073 0.073 0.073 0.073 0.073 0.073
C4
0.066 0.066 0.066 0.066 0.066 0.066 0.066 0.066 0.066 0.066 0.066 0.066
C5
0.063 0.063 0.063 0.063 0.063 0.063 0.063 0.063 0.063 0.063 0.063 0.063
C6
0.060 0.060 0.060 0.060 0.060 0.060 0.060 0.060 0.060 0.060 0.060 0.060
C7
0.081 0.081 0.081 0.081 0.081 0.081 0.081 0.081 0.081 0.081 0.081 0.081
C8
0.071 0.071 0.071 0.071 0.071 0.071 0.071 0.071 0.071 0.071 0.071 0.071
C9
0.069 0.069 0.069 0.069 0.069 0.069 0.069 0.069 0.069 0.069 0.069 0.069
C 10 0.067 0.067 0.067 0.067 0.067 0.067 0.067 0.067 0.067 0.067 0.067 0.067 C 11 0.157 0.157 0.157 0.157 0.157 0.157 0.157 0.157 0.157 0.157 0.157 0.157 C 12 0.131 0.131 0.131 0.131 0.131 0.131 0.131 0.131 0.131 0.131 0.131 0.131 Parentheses () denote the rankings in integrating important index based on ANP.
(C 12 ) and face recognition (C 2 ) are the core technologies of sustainable intelligent technology architecture design. The criteria performance of six bit.
5 Conclusions Although many new buildings have been described as “smart” buildings in the past, their level of building intelligence varies greatly depending on the functionality and operational efficiency of the smart components that have been installed. The disconnect between expectations and the implementation of smart buildings can sometimes be exacerbated by criticism that smart buildings do not respond flexibly to the needs and expectations of end users. Based on several aspects of smart architecture design for intelligent information system satisfactorily sustainable development, we have combined the DEMATEL and DANP method to form a hybrid MADM approach that considers the importance of a range of criteria and the interdependence among them. An empirical test of the approach using a case of Taiwanese study illustrates its usefulness and the meaningful implications for decision makers. Final analysis results in the hybrid model: first, D1 > D2 > D3 ; biometric recognition system ( D1 ) had the best satisfaction performances that cryptography system ( D3 ) exhibits the largest size gap. Secondly, in terms of technology, 6 digits (C 11 ), 8 digits (C 12 ) and face recognition (C 2 ) are the core technologies of sustainable intelligent technology architecture design.
D3
0.131
1.00
sum
0.067
C 10
C 12
0.069
C9
0.157
0.071
C 11
0.081
C8
0.063
C5
C7
0.066
C4
0.060
0.073
C3
C6
0.082
C2
D2
0.079
C1
D1
Global weights
Criteria
Dimensions
–
(2)
(1)
(9)
(8)
(7)
(4)
(12)
(11)
(10)
(6)
(3)
(5)
Rankings
–
5.25
6.25
4.50
3.50
5.50
8.00
6.25
7.25
4.00
4.50
6.50
7.50
Local weight
1.00
0.29
0.35
0.36
Dimensions weight
5.80
5.25
6.25
4.50
3.50
5.50
8.00
6.25
7.25
4.00
4.50
6.50
7.50
Criteria performance values (A)
Table 5 Smart architecture design for integrating plan index of criteria weightings by ANP
10.00
10.00
10.00
10.00
10.00
10.00
10.00
10.00
10.00
10.00
10.00
10.00
10.00
Aspired levels (B)
4.20
4.75
3.75
5.50
6.50
4.50
2.00
3.75
2.75
6.00
5.50
3.50
2.50
Gaps (B-A)
5.75
5.75
5.55
5.95
Dimensions performance values (C)
10.00
10.00
10.00
10.00
Aspired levels (D)
4.25
4.25
4.45
4.05
Gaps (D-C)
Assessment and Improvement of Intelligent Technology … 295
296
V. Y.-C. Chen et al.
References 1. Saaty, T.L.: Decision making with dependence and feedback: analytic network process. . RWS Publications: Pittsburgh, PA USA (1996) 2. Bart, M., Dirk, S., Hilde, B.: Analysing modelling challenges of smart controlled ventilation systems in educational buildings. J. J. Build. Perform. Simul. 14, 116–131 (2021) 3. Zhang, R., Jiang, T., Li, G.: Stochastic optimal energy management and pricing for load serving entity with aggregated TCLs of smart buildings: a stackelberg game approach. J. IEEE Trans. Ind. Inform. 17, 1821–1830 (2021) 4. Gupta, A., Badr, Y., Negahban, A., Qiu, R.G.: Energy-efficient heating control or smart buildings with deep reinforcement learning. J. Journal of Building Engineering 34, 101739 (2021) 5. Ge, H., Peng, X., Koshizuka, N.: Applying knowledge inference on event-conjunction for automatic control in smart building. J. Appl. Sci. Basel. 11, 935 (2021) 6. Smith, S.: ‘Intelligent buildings. In: Best, R., de Valence, G. (eds.), ‘Design and construction: building in value.’ J. Butterworth Heinemann, UK, pp. 36–58 (2002) 7. Wan, P., Woo, T.K.: ‘Designing for intelligence in Asia buildings,’ J. Proceedings 1st IEE International Conference on Building Electrical Technology (BETNET), IEEE, Hong Kong, pp. 1–5 (2004) 8. Wong, J., Li, H., Lai, J.: Evaluating the system intelligence of the intelligent building systems Part 1: development of key intelligent indicators and conceptual analytical framework. J. Autom. Constr. 17, 284–302 (2008) 9. Wong, J., Li, H.: ‘Development of a conceptual model for the selection of intelligent building systems.’ J. Build. Environ. 41, 1106–1123 (2006) 10. Chen, Y.C., Lien, H.P., Tzeng, G.H.: Measures and evaluation for environment watershed plans using a novel hybrid MCDM model. J. Expert Syst. Appl. 37(2), 926–938 (2009) 11. Tzeng, G.H., Chiang, C.H., Li, C.W.: Valuating intertwined effects in e-learning programs: a novel hybrid MCDM model based on factor analysis and DEMATEL. J. Expert Syst. Appl. 32(4), 1028–1044 (2007)
IT Support for the Optimization of the Epoxidation of Unsaturated Compounds on the Example of the TOPSIS Method Aleksandra Radomska-Zalas
and Anna Fajdek-Bieda
Abstract Ethylene oxide began to be produced on an industrial scale through the catalytic oxidation of ethylene with air in 1937, and Shell later launched an oxygen oxidation plant. Silver metal deposited on an inert support was used as a catalyst. However, this method fails with the oxidation of other alkenes and functional groups and with ethylene unsaturation. In the case of propylene epoxidation, about half of the world’s production of propylene oxide comes from the chlorine method and half from the use of hydroperoxides (t-butyl hydroperoxide and ethylbenzene). For substituted alkenes, liquid-phase peroxidation is still the most used method, although the reaction is slow and leads to a large number of carboxylic acids and esters as by-products. The article presents the results of research on the application of the TOPSIS method within the proprietary IT system in order to select the optimal process parameters. The aim of the research was to answer the question whether the operation of the proprietary IT system using the TOPSIS method would significantly simplify the testing process, shorten the time to obtain results and reduce the cost of testing. Keywords Optimization · Epoxidation · The TOPSIS method
1 Introduction The optimization of production processes has become an important part of the daily activities of enterprises. Its importance increased not only for technological progress, but also for the increasing demands of customers as to the quality of products generating higher production costs. The use of artificial intelligence in information systems has become one of the directions of activities supporting the development of optimization mechanisms for production processes [1]. Computerization changes the approach to production and innovation, but it must provide solutions that reduce costs, increase revenues, improve quality or impact resource efficiency. Systemic
A. Radomska-Zalas (B) · A. Fajdek-Bieda The Jacob of Paradies University, Teatralna 25, 66-400 Gorzów Wielkopolski, Poland e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_25
297
298
A. Radomska-Zalas and A. Fajdek-Bieda
educational experiences, gaining knowledge from experts and, at the same time, applicants are conducive to making optimal decisions [2]. The current state of research shows that expert systems that combine artificial intelligence methods with quantitative and qualitative processing have advantages. Combining artificial intelligence methods, including identification, optimization and teaching with traditional expert systems, favours the creation of systems with greater potential and greater system adaptability [3]. The created IT system is equipped with artificial intelligence mechanisms and allows for multiple use when making a specific decision, taking into account different input states each time. The system has therefore become a universal simulation tool. It should be added that the use of an expert system, especially in the optimization of production processes, may have an impact on the shortening of the entire process cycle, which enables the analysis of a greater number of cases and, in combination with the possibility of using historical data, increases the efficiency and accuracy of decisions made. Epoxidation of unsaturated compounds belongs to the group of leading technological processes. Metallic silver deposited on an inert support was used as a catalyst. This method fails for the oxidation of other alkenes and ethylenically unsaturated functional compounds [4]. In the case of propylene epoxidation, about half of the world’s production of propylene oxide comes from the chlorine method, and half is produced with the use of hydroperoxides (t-butyl hydroperoxide and ethylbenzene). For substituted alkenes, liquid-phase per-acid epoxidation is still the most used method even though the reaction is slow and results in a large number of carboxylic acids and esters as by-products [5]. An alternative solution is to use organic hydroperoxides or hydrogen peroxide as epoxidizing agents. A significant advantage of hydrogen peroxide is that the only by-product associated with its use is water. The relatively low price makes hydrogen peroxide attractive. The processes running with its use generally require catalytic activation. The group of such activators includes compounds of transition metals, including titanium [6]. Transition metals form a number of complexes with it which are actual catalysts for epoxidation. Due to the complex of the oxidizing agent with the active centre of the catalyst, the formation of the epoxy compound is highly selective and efficient, with mild process conditions. Titanium silicalite as an epoxidation catalyst is characterized by high selectivity, stereoselectivity and electrophilic character, often masked by steric factors imposed by the channel system [8]. MCDA methods are used to assist in making decisions in cases where there is more than one conflicting criterion, finding the optimal choice among alternatives. These methods support the analysis of decisions based on a number of criteria, and the methods cannot make final decisions [8]. In fact, the main purpose of the MCDA is to help users synthesize information so that they feel comfortable and confident in making decisions. The MCDA integrates objective measurements with value assessments and makes clear subjectivity. Making a decision is not only a matter of choosing the best alternative. It is often necessary to prioritize all alternatives to resource allocation or to combine the strength of individual preferences to create a collective preference. The mathematics used in decision-making provides methods for quantifying or prioritizing personal or group judgments, which are usually immaterial and
IT Support for the Optimization of the Epoxidation of Unsaturated …
299
subjective [9]. Decision-making involves comparing different types of alternatives by decomposing preferences into multiple properties that have alternatives, determining their significance, comparing and obtaining the relative preferences of alternatives for each property and synthesizing the results to obtain an overall preference. Therefore, the strategy is to break down a complex problem into smaller pieces and set a meaning or priority to rank the alternatives in a comprehensive and general way to look at the problem mathematically [10, 11]. The aim of the work is to analyse the application of a selected method from the MCDA group, i.e. TOPSIS, in an IT system supporting the optimization of industrial processes. The article presents the result of a study conducted in the field of the application of the TOPSIS method for optimization in the selection of parameters from the input data set and the construction of a matrix of optimal parameters of the epoxidation process. The presented results concern the analysis of the effectiveness of the use of the TOPSIS method in tasks related to industrial processes in the IT system. Section 2 presents the characteristics of the applied optimization method. Section 3 describes the test procedure. Section 4 presents the results of research and discussion, and the article ends with the conclusions presented in Sect. 5.
2 The TOPSIS Method The Technique for Order of Preference by Similarity to Ideal Solution (The TOPSIS) method is called the technique of preference order by similarity to the ideal solution, which was proposed in 1981 by Hwang and Yoon [12]. The TOPSIS method assumes finding the best solution from a full set of solutions—alternatives [13, 14]. The TOPSIS method is considered to be a more effective approach compared to other heuristic methods, as it works even for a small number of parameters, while maintaining high consistency and lower computational complexity. According to the TOPSIS methodology, the best solution is the one that has the shortest distance to the best solution and at the same time the farthest from the worst solution [15]. It consists in comparing the analysed decision options with reference solutions: the ideal solution and the evaluation of the distance between the variants and these solutions. As the optimal variant, the selected variant is selected with the shortest distance from the “ideal” solution and the largest distance from the “anti-ideal” solution. A positive “perfect” solution is defined as the sum of all the best values that can be achieved for each attribute, while the “anti-ideal” solution consists of all the worst values that can be achieved for each attribute [16]. TOPSIS takes into account both the distance of the positive ideal solution and the distance to the negative ideal solution, taking into account the relative proximity to the positive ideal solution. By comparing the relative distance, an alternate priority order can be obtained. The TOPSIS method algorithm consists of the following steps [17]: 1.
Establishing a normalized decision matrix, according to the formula 1:
300
A. Radomska-Zalas and A. Fajdek-Bieda
xi j X i j = m i=1
2.
(1) xi2j
where X ij —normalized value of j-criterion in i-th option; x ij —real value of j-criterion in i-th option; m—the number of possibilities. Determination of the weighted normalized decision matrix according to the formula W Xi j = Xi j ∗ w j
3.
(2)
where WX ij —normalized weighted value of j-criterion in the i-th option; X ij —normalized value of j-criterion in i-th option; wj —weight assigned to the j-th criterion. Determining the evaluation of a weighted “ideal” solution and a valid “antiideal” solution. PIS means positive ideal solution, and NIS means negative ideal solution. PIS, as an “ideal” solution, includes the highest maximized values and the lowest minimized variables, NIS vice versa. The formulas (3) and (4) are used for the calculations: PIS+ = W X 1 + W X 2+, . . . , W X n+ = {max i W Xi j, min i W Xi j} NIS− = W X 1− , W X 2− , · · · , W X n− = {min W X i j , max W X i j } i
4.
i
(3) (4)
Determining the distance from the weighted “ideal” and “anti-Ideal” solution, according to the formulas 5 and 6. 2 n + W X i j − W X +j di =
(5)
j=1
where di+ —distance from the “ideal” solution. 2 n − di = W X i j − W X −j j=1
5.
where di− —distance from the “anti-ideal” solution. Determination of the relative distances from the weighted ideal solution:
(6)
IT Support for the Optimization of the Epoxidation of Unsaturated …
RDi = 6.
di+
di− + d− j
301
(7)
Choosing the best RDi variant. Ranking of decision variants in relation to the value of the coefficient of the relative closeness of decision variants to the ideal solution. The higher the value of this coefficient, the better the decision variant. The best variant is the one with the highest RDi value.
The use of the TOPSIS method enables the selection of optimal process parameters that significantly affect the test process and at the same time allows the user to ignore those factors that have a negligible impact on the results. It can greatly simplify the testing process, reducing the time to obtain reliable results and lowering testing costs as it reduces the number of required tests. However, keep in mind that one of the limitations of this method is the need to apply weight, which is usually subjective.
3 Results of Optimization of the Epoxidation of Allyl Alcohols on a Titanium Silicate Ti-MWW Catalyst by the TOPSIS Method The research was carried out with the use of the proprietary IT system, which is to support the calculation process and the decision support process [18, 19]. During the system performance study for the epoxidation of allylic alcohols on the Ti-MWW titanium silicalite catalyst, five dependent and six independent criteria were introduced into the system. Optimization of the epoxidation of crotyl alcohol 30% . With hydrogen peroxide, on a Ti-MWW catalyst at atmospheric pressure, was performed based on the mathematical method of experimental planning. Crotyl alcohol turned out to be the most reactive in the system: Ti-MWW catalyst—hydrogen peroxidemethanol. In the case of this alcohol, the epoxidation process proceeded with the highest selectivity of the transformation towards 2,3EB1O with the highest conversion of crotyl alcohol and hydrogen peroxide at the same time. The optimization allowed for a more accurate determination of the most advantageous technological parameters of the process. The following parameters were used in the optimization process: selectivity of the conversion to 2,3-epoxybutan-1-ol in relation to the reacted crotyl alcohol, conversion of crotyl alcohol, yield of 2,3-epoxybutan-1-ol and selectivity of the conversion to organic compounds in relation to the reacted crotyl alcohol hydrogen peroxide. In order to obtain the most appropriate mathematical description of the process, five factors (independent parameters) were taken into account [20]: • • • •
P1—temperature—(10 ÷ 60 °C); P2—molar ratio AK/H2 O2 —(0.2 ÷ 3,0:1); P3—methanol concentration—(5 ÷ 90% wag.); P4—catalyst concentration Ti-MWW—(1 ÷ 9.0% wag.);
302
A. Radomska-Zalas and A. Fajdek-Bieda
Table 1 Standard and actual values of the experiment plan Level
Standard
Input parameters The actual parameters Temperature (°C)
Molar ratio AK/H2O2
Concentration CH3OH (% weight)
Concentration Ti-MWW (% weight)
Time (min)
P
P1
P1
P2
P4
P5
Basic
0
35
1.6
47.5
5
165
Higher
1
48
2.3
68.7
7
232.5
Lower
−1
23
0.9
26.2
3
97.5
Stellar taller
2
60
3
90
9
300
Stellar lower
−2
10
0.2
5
1
30
• P5—response time—(30 ÷ 300 min). The actual and standard values of the parameters (input quantities) at the levels resulting from the experimental design are presented in Table 1. The response functions characterizing the epoxidation of crotyl alcohol were assumed: • z1—selectivity of the conversion to 2,3-epoxybutan-1-ol with respect to the reacted crotyl alcohol, S2,3EB1O/AK; • z2—selectivity of the conversion to organic compounds in relation to the reacted hydrogen peroxide, Szw.org./H2O2; • z3—conversion of crotyl alcohol, KAK; • z4—yield of 2,3-epoxybutan-1-ol, W2,3EB1O; • z5—selectivity of 2.3 DMB1O, S2,3DMB1O; • z6—selectivity of DIOL, SDIOL. Since the method is valid for the weights assigned to a specific criterion, a test was carried out for four variants, i.e. • variant 1—equal importance of all parameters. • variant 2—the most important S23EB1O/AK (0.5), other parameters 0.1 each; • variant 3—KAK is of the greatest importance (0.5), the remaining parameters are 0.1 each; • variant 4—the most important W2.3EB1O (0.5), other parameters 0.1 each. On the basis of the given input data, the system for each variant established a normalized decision matrix and then a weighted normalized decision matrix. In the next step, he calculated positive ideal solution (PIS) and negative ideal solution (NIS). PIS, or “ideal” solution, contains the highest maximized values and the lowest minimized variables, NIS, od “anti-ideal”, vice versa.
IT Support for the Optimization of the Epoxidation of Unsaturated …
303
In accordance with the assumptions of the TOPSIS method, the distance of the considered possibilities from the “ideal” and “anti-ideal” solutions was then calculated. In the last step, the ratio of the relative proximity of the decision variants to the ideal solution was determined. Based on the obtained values of the relative proximity coefficient of the decision variants, assuming that the higher the Rdi value, the better the decision variant, the system generated a ranking of variants, which is presented in Table 2. The main system performance results for the TOPSIS method are presented in Table 3.
4 Results and Discussion When analysing the three highest classified parameters, one option deserves special attention, i.e. study 8, which was indicated as optimal for all variants. Importantly for all variants, the system returned tests 11. 21. 27 in the next three places. Table 4 presents the parameters of the processes. The use of the IT system allowed for the optimization of the epoxidation process while maintaining the principle that the lower the process parameters, the better. The advantage of this solution is the possibility of using various optimization algorithms and comparing the results. For the purposes of this article, the results for the TOPSIS method are presented. The IT system reduced the time of performing syntheses to 660 min. Without the use of a synthesis system, they required 4.5225 min. The small dispersion of the remaining results confirms that the TOPSIS method works best for many criteria. The overall assessment of the results shows that the use of the TOPSIS method allows the selection of optimal process parameters that significantly affect the test process and at the same time allows the user to ignore those factors that have a negligible impact on the results. It can greatly simplify the testing process, reducing the time to obtain reliable results and lowering testing costs as it reduces the number of required tests. However, keep in mind that one of the limitations of this method is the need to apply weight, which is usually subjective.
5 Summary and Conclusion Although the TOPSIS method works best for large numbers of criteria, even with two gives satisfactory results. The general assessment of the results allows to conclude that the use of the TOPSIS method allows for the selection of optimal process parameters that significantly affect the test process and at the same time allows the user to omit those factors that have a negligible impact on the results. It can simplify the test process very much, reducing the time to obtain reliable results and lowering the cost of testing, because it reduces the number of tests required. However, it should
304
A. Radomska-Zalas and A. Fajdek-Bieda
Table 2 Relative distances from the weighted “ideal” solution and selection of the best variant Rdi variant 1
Rdi variant 2
Rdi variant 3
Rdi variant 4
1
0.1035
0.0734
0.4482
0.0707
2
0.1642
0.2779
0.4128
0.2431
3
0.1160
0.0915
0.4027
0.0869
4
0.1840
0.3557
0.3995
0.3447
5
0.1147
0.1470
0.3958
0.1381
6
0.1233
0.0740
0.3936
0.1598
7
0.1049
0.0945
0.3698
0.0895
8
0.7208
1.5821
0.3648
1.5296
9
0.1113
0.1283
0.3270
0.1226
10
0.1744
0.3094
0.2247
0.3013
11
0.6415
1.4049
0.2201
1.3619
12
0.5834
1.2749
0.2109
1.2388
13
0.2321
0.4615
0.2069
0.4566
14
0.6256
1.3541
0.2009
1.3431
15
0.1263
0.0760
0.1891
0.0758
16
0.5757
1.2631
0.1886
1.2170
17
0.2920
0.6112
0.1744
0.5882
18
0.2640
0.5434
0.1744
0.5326
19
0.2137
0.5293
0.1631
0.3104
20
0.3183
0.6832
0.1627
0.6597
21
0.6577
1.4347
0.1607
1.4027
22
0.5434
1.3793
0.1410
0.9266
23
0.1306
0.2065
0.1358
0.1550
24
0.6276
1.3830
0.1345
1.3236
25
0.2544
0.5900
0.1335
0.4549
26
0.2954
0.6871
0.1333
0.5488
27
0.6557
1.5536
0.1331
1.2661
28
0.1780
0.3696
0.1319
0.2923
29
0.1528
0.2993
0.1317
0.2205
30
0.3258
0.7614
0.1282
0.6131
31
0.3507
0.8273
0.1245
0.6591
32
0.2641
0.5991
0.1036
0.4913
33
0.2641
0.5991
0.0942
0.4913
IT Support for the Optimization of the Epoxidation of Unsaturated …
305
Table 3 Ten highest scores for each variant Variant 1
Variant 2
Lp
Rdi
8 21
Variant 3 Lp
Variant 4
Lp
Rdi
Rdi
Lp
Rdi
0.720847
8
1.58206
8
0.448239
8
1.52964
0.657654
27
1.55363
21
0.41283
21
1.402656
27
0.655687
21
1.43465
11
0.402744
11
1.361864
11
0.641467
11
1.40488
27
0.399499
14
1.343055
24
0.627578
24
1.38303
14
0.395834
24
1.323596
14
0.625634
22
1.37927
24
0.393621
27
1.266056
12
0.583441
14
1.35410
12
0.36981
12
1.238799
16
0.575745
12
1.27489
16
0.364755
16
1.21695
22
0.543426
16
1.26308
22
0.32698
22
0.926615
31
0.350714
31
0.82725
20
0.224655
20
0.659686
Table 4 Optimal process parameters Temperature (°C)
Molar ratio AK/H2 O2
Concentration CH3 OH (% weight)
Concentration Ti-MWW (%weight)
Time (min)
8
23
21
35
0.9
26.2
3
97.5
1.6
47.5
5
27
165
48
2.3
68.7
7
232.5
11
35
1.6
47.5
5
165
be remembered that one of the limitations of this method is the need to give weight, which is usually subjective. The TOPSIS method does not require IT support, but the use of an IT system as a tool supporting the calculation process is important. It enables the creation of a knowledge base, which in the context of repeatability of research will allow to increase the correctness of the final results, as well as to compare the results achieved for various multi-criteria analysis methods. It is also important to use artificial intelligence mechanisms in the system. An IT system equipped with the mechanisms of artificial intelligence gives the possibility of repeated use in making a concrete decision, considering different input states each time. Such a system thus becomes a universal tool with a simulation character. It should be added that the use of an expert system particularly in the optimization of production processes may have an impact on shortening the entire process cycle, which gives the opportunity to analyse a larger number of cases and in combination with the possibility of using historical data increases the efficiency and accuracy of the decisions taken. In the long term, an interesting direction of research, apart from the selection of the optimization method, is the selection of a classifier model supporting the production process depending on the needs, based on specific requirements and preferences. In addition to the results of classification and the size of a set of features, e.g. the
306
A. Radomska-Zalas and A. Fajdek-Bieda
ability to explain a classifier, needed to explain why a particular decision should be made, may be important when selecting a classifier. Often, the selection of a classifier model will consist in seeking a compromise between the indicated features of individual models. In this case, the assessment of different approaches is a multicriteria problem [21], and therefore, multi-criteria decision support methods can be used in further research [22].
References 1. 2. 3. 4.
5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.
16.
17.
18. 19.
20.
Beyer, B.: Site Reliability Engineering. O’Reilly Inc., USA (2016) Fülöp, J.: Introduction to decision making methods. In: BDEI-Workshop, nr 37, s. 1–15 (2005) Sommervile, I.: Software Engineering. Pearson (2015) Sinadinovi´c-Fišer, S., Jankovi´c, M., Borota, O.: Epoxidation of castor oil with peracetic acid formed in situ in the presence of an ion exchange resin. Chem. Eng. Process. 62, 106–113 (2012). https://doi.org/10.1016/j.cep.2012.08.005 Mandelli, D., Van Vliet, M.C.A., Sheldon, R.A., Schuchardt, U.: Appl. Catal. A: General 219, 209 (2001) Li, G., Wang, X., Yan, H., Liu, Y., Liu, X.: Appl. Catal. A - Gen 236, 1 (2002) Clerici, M.G., Ingallina, P.: Epoxidation of lower Olefins with Hydrogen Peroxide and Titanium Silicalite. J. Catal. 140, 71–83 (1993) Tsang, A.H.C.: Condition-based maintenance: tools and decision making. J. Qual. Main. Eng. 1(3), 3–17 (1995) Pintelon, L.M., Gelders, L.F.: Maintenance management decision making. Eur. J. Oper. Res. 58(3), 301–317 (1992) Jackson, C., Pascual, R.: Optimal maintenance service contract negotiation with aging equipment. Eur. J. Oper. Res. 189(2), 387–398 (2008) Bhushan: Strategic decision making applying the analytic hierarchy process. London (2004) Hwang, C.L., Yoon, K.: Multiple Attribute Decision Making Methods and Applications. Springer-Verlag (1981). https://doi.org/10.1007/978-3-642-48318-9 Decui, L., Zeshui, X., Dun, L., Yao, W.: Method for three-way decisions using ideal TOPSIS solutions at Pythagorean fuzzy information. Inf. Sci. 435, 282–295 (2018) Sriragen, A.K., Sathiya, P.: Optimisation of process parameters for gas tungsten arc welding of incoloy 800HT using TOPSIS. Mater. Today: Proc. 4, 2031–2039 (2017) Yuvaraj, N., Pradeep, K.M.: Multiresponse optimization of abrasive water jet cutting process parameters using TOPSIS approach. Mater. Manuf. Process. 30(7), 882–889 (2014). https:// doi.org/10.1080/10426914.2014.994763 Sinrat, S.: Atthirawong A conceptual framework of an integrated fuzzy ANP and TOPSIS for supplier selection based on supply chain risk management. In: IEEE International Conference on Industrial Engineering and Engineering Management (IEEM) (2013). Kasprzak D. (2018). Fuzzy Topsis method for group decision making. In: Multiple Criteria Decision Making, vol. 13. Publishing House of the University of Economics in Katowice. ISSN: 2084-1531 Radomska-Zalas, A., Perec, A., Fajdek-Bieda, A., IT support for optimisation of abrasive water cutting process using the TOPSIS method. IOP Conf. Ser.: Mater. Sci. Eng. 710, (2019) Perec, A., Pude, F., Grigoriev, A., Kaufeld, M., Wegener K.: A study of wear on focusing tubes exposed to corundum-based abrasives in the waterjet cutting process. Int. J. Adv. Manuf. Technol. (2019). https://doi.org/10.1007/s00170-019-03971-0 Fajdek, A., Wróblewska, A., Milchert, E., Grzmil, B.: The Ti-MWW catalyst - its characteristic and catalytic properties in the epoxidation of allyl alcohol by hydrogen peroxide. Polish J. Chem. Technol. 12, 29–34 (2010). https://doi.org/10.2478/v10026-010-0006-1
IT Support for the Optimization of the Epoxidation of Unsaturated …
307
21. Ziemba, P.: Towards strong sustainability management—a generalized PROSA method. Sustainability 11(6), 1555 (2019). https://doi.org/10.3390/su11061555 22. W˛atróbski, J., Jankowski, J., Ziemba, P., Karczmarczyk, A., Zioło, M.: Generalised framework for multi-criteria method selection. Omega 86, 107–124 (2019). https://doi.org/10.1016/ j.omega.2018.07.004
Land Suitability Evaluation by Integrating Multi-criteria Decision-Making (MCDM), Geographic Information System (GIS) Method, and Augmented Reality-GIS Hanhan Maulana and Hideaki Kanai Abstract The purpose of this study is to evaluate land suitability by integrating the multi-criteria decision-making (MCDM) with geographic information system (GIS) method. Augmented Reality-GIS is used to present the resulting land suitability map. This evaluation process using the physical land characteristics, such as climate, soil, and topography factors. The analytical hierarchy process (AHP) method generates weights for the thematic Layer. The weight is needed to overlay the thematic layer using the weighted overlay method. Based on the overlay results, the area study has the best suitability for cultivating onions, cabbage, and potatoes. This result means that these three commodities are well suited for planting in most agriculture areas in the Bandung regency. This study also designed a GIS application prototype using augmented reality. The aim is to make it easier for farmers to read the land suitability map produced. Based on the visual aspect, the AR map can provide better visualization. With AR-GIS is hoped the farmers’ understanding of the land suitability map can be increase. So that can reduce the risk of crop failure experienced by farmers. Keywords MCDM · AHP method · Weighted overlay · GIS · AR-GIS
1 Introduction Agriculture has a crucial role in meeting food needs in Indonesia. Unfortunately, the productivity of the agricultural sector in Indonesia has decreased [1]. This decline occurred because of reduced agriculture area due to the construction of residential and industrial areas. Another problem that causes a decrease in agricultural productivity
H. Maulana (B) · H. Kanai School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1, Asahidai, Nomi, Ishikawa, Japan e-mail: [email protected] H. Maulana Department of Informatics Engineering and Computer Science, Universitas Komputer Indonesia, Jl. Dipatiukur no 112-114, Bandung 40134, Indonesia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_26
309
310
H. Maulana and H. Kanai
is crop failure [2]. Usually, the cause of crop failure is improper planting patterns and mistakes of farmers in determining commodities. Climate, soil, and topography are crucial factors in the land evaluation process [3]. The land evaluation also requires a large amount of geospatial data [4]. In Indonesia, geospatial data is available on the open data site [2]. A large amount of geospatial data can be handled effectively by geographic information systems (GISs). In Indonesia, GIS already applies in the agriculture field. Unfortunately, the use of GIS is only limited to large-scale agricultural agencies/companies [2]. It is not used directly by farmers. It is because not every farmer has the ability to read maps. For ordinary people, it is hard to get information, especially information related to topography such as elevation and slope. The selection of suitable land is a very crucial factor in building sustainable agriculture. A multi-criteria decision-making technique (MCDM) is required to provide better land selection decisions [5]. Analytic hierarchy process (AHP) is a method of decision making by applying the MCDA approach. AHP method has been criticized for its possible problem like the rank reversal phenomenon [6]. But AHP method is still the most appropriate method for the land evaluation process [7, 8]. Besides, according to research conducted by Wang et al. (2009), the reversal method does not only occur in the AHP method but also other decision-making methods [9]. In that study, Wang revealed the possibility that the reversal phenomenon was a normal phenomenon. Herzberg et al. (2019). Combine GIS with local farmer knowledge as a basis for decision-making [4]. Research on land evaluation is often carried out in countries that have great potential in the agricultural sector [4, 10–12]. Tercan et al. evaluated the selection of suitable land for citrus plants [5]. In Indonesia, research on land suitability evaluation using AHP method is carried out specifically for one type of commodity [12, 13]. This study conducted a land suitability evaluation process using MCDM and GIS. Land characteristics are used as the basis for decision-making [14]. Climate data, soil conditions, and topography were used as evaluation criteria. By integrating MCDM and GIS, the land evaluation process will be more relevant [4]. The purpose of this study is to make land suitability maps for potential commodities in the study area. A land suitability map can assist farmers in the decision-making process regarding the selection of agricultural commodities. To make it easier for farmers to understand the land suitability map produced, a new way to visualize the land suitability map is needed. Based on the initial research, farmers in the study area are more familiar with mobile devices than computers. In this study, the land suitability map visualizes using a prototype augmented reality GIS. Augmented reality provides a new way to interact with landscape maps [15]. Carrera et al. and Hadley et al. explain that augmented reality can improve the understanding of the information on topography maps [16, 17]. It is expected AR-GIS can make it easier for farmers to understand land suitability maps. It because the AR-GIS visualizes the same shape as the actual surface of the earth.
Land Suitability Evaluation by Integrating Multi-criteria …
311
Fig. 1 The area of Bandung regency and the agriculture areas
2 Material and Method 2.1 Study Area Bandung regency is one of the districts in the West Java province, Indonesia. It has a total area of 176,238.67 ha. The agricultural area in Bandung Regency is approximately 110,000 ha. It is located between 107.14 N to 107.00 E latitude to 6.49 N to 7.18 E longitude. Bandung regency is a highland area with slopes between 0–8%, 8–45%, and above 45%, have a tropical climate with annual rainfall between 1500 and 4000 mm [18]. Figure 1 describes the area of Bandung Regency and the agriculture areas.
2.2 Material Accurate data are an essential part of GIS. In making a land suitability map, the difference in value and range of criteria can cause a difference in land suitability output. There are open data sites that provide various data. Not only tabular data but also geospatial data are also available. In this study, the selection of criteria and sub-criteria is according to the guidelines issued by the Food and Agriculture Organization of the United Nations (FAO) and the land suitability guidelines issued by the Indonesian government’s Ministry of Agriculture [12, 14]. Table 1 describes the dataset list, data sources, and data types used in making land suitability maps.
312
H. Maulana and H. Kanai
Table 1 Dataset list, data sources, and data types used in making land suitability maps Criteria
Sub criteria
Data source
Map method
Soil
Soil depth
Soil map Scale 1: 50.000, Research and Development Center of Agricultural Land Resources, Ministry of Agriculture of the Republic Indonesia, 2016
Convert from MapInfo format (Tab) to Esri format (Shp)
Soil texture Soil base saturation Soil type
Climate
Topography
Temperature
MODIS NASA LP DAAC Original data at the USGS EROS Center 2018–2019 (Satellite Image)
Precipitation
Annual rainfall from 5 climate stations in West Java,
Convert using Isohyet method
Elevation
Indonesia Open Geospatial Data Research and Development Center of Agricultural Land Resources, Ministry of Agriculture
Original data
Slope Agriculture land use type
2.3 Analytical Hierarchy Process (AHP) The AHP is one of the methods with the MCDM approach. This method is often used for the weighting process in research related to GIS-based land evaluation. The purpose of using the AHP method is to get a priority scale [4, 7, 8]. The priority scale is obtained from the pairwise attribute matrix comparison based on the participant’s assessment. The hierarchical structure consists of three levels. The first level contains the main objective to determine the level of land coolness in the study area. The next level of the hierarchy fills with criteria and sub-criteria. The characteristics of the land used are climate, soil, and topography. These characteristics have a critical influence on the requirements for plant growth. The sub-criteria were chosen by referring to the land suitability guidelines issued by the Ministry of the Republic of Indonesia. Figure 2 describes the hierarchical structure of the land suitability evaluation process. The pairwise comparison matrix was applied to determine the priority order of the criteria. The comparison matrix comes from expert judgment. Then, the relative importance criteria value in the pairwise comparison matrix is calculated using the Saaty’s scale [19–22]. Table 2 describes the pairwise comparison matrix scales. In applying the AHP Method, it is necessary to check the consistency of the pairwise comparison matrix. The consistency ratio (CR) is an indicator of the level of consistency. CR can also be used to check probability and avoid random scoring of the matrix. If the CR value is ≤0.1 it means the pairwise comparison matrix is
Land Suitability Evaluation by Integrating Multi-criteria …
313
Lan Suitability Soil Criteria
Soil Depth
Soil Texture
Soil Base Saturation
Soil Type
Temperature
Precipitation
Agriculture Land Use
Climates Criteria
Elevation
Sub criteria
Slope
Criteria
Topoghrapy Criteria
Fig. 2 The hierarchical structure of the land suitability evaluation process
Table 2 The pairwise comparison matrix scales
Numerical scale Descriptions 9
Criterion x is extremely more important than Criterion y
7
Criterion x is strongly more important than Criterion y
5
Criterion x is more important than Criterion y
3
Criterion x is moderate more important than Criterion y
1
Criterion x is equally important than Criterion y
2, 4, 6, 8
Intermediate value
Reciprocals
Inverse comparison value
acceptable [19]. Equation (1) is used to calculate the consistency ratio (CR). CR =
CI RI
(1)
where CR is Consistency Ratio, Ci is Consistency Index, and RI is a random index. RI value depends on the value in the table RI defined by Saaty’s. Meanwhile, to calculate the CI, Eq. (2) is used. CI =
λmax − n n−1
(2)
where λmax is the largest eigenvalue of the matrix. N is number of compared criteria. λmax is obtained using Eq. (3). ny=1 wx ∗C x y λmax =
wx
n
(3)
314
H. Maulana and H. Kanai
2.4 Weighted Overlay Method The weighted overlay is the most commonly used method. This method integrates thematic layers according to the criteria in the AHP method [10, 23–25]. Equation (4) is applied to assemble the assessment criteria. wi si j s= wi
(4)
where: W i is the weight factor of criterion, S ij is the weight of the spatial map class, and s is the value of the spatial unit output. This overlay process produces a land suitability map.
3 Result and Discussion There are four layers of soil characteristics (see Fig. 3). The four soil characteristics are soil type, base saturation, soil texture, and soil depth. There are two layers of climate characteristics (see Fig. 4). Temperature data is obtained from annual averages based on satellite images. Meanwhile, precipitation data is obtained from open data from the Meteorology and Geophysics Agency of
Fig. 3 Soil characteristic layers; a soil type; b base saturation; c soil texture; d soil depth
Land Suitability Evaluation by Integrating Multi-criteria …
315
Fig. 4 Layers of climate characteristics; a precipitation; b temperature
the Republic of Indonesia. Figure 4 describes the climate characteristics in the study area. There are three topography characteristics layers (see Fig. 5). Sources of elevation and slope data are extracted from Digital Elevation Model (DEM) data. While vegetation data is obtained from the Research and Development Center of Agricultural Land Resources, Ministry of Agriculture. Based on agricultural statistics in the study area, six main crops have the highest productivity. These commodities include rice, cassava, sweet potatoes, potatoes,
Fig. 5 Layer of topography characteristics; a elevation; b slope; c agriculture land use
316
H. Maulana and H. Kanai
Table 3 Final weight of sub criteria Criteria
Soil
Climate Topography
Sub criteria
Final weight for commodity Rice
Onion
Cassava
Potatoes
Cabbage
Sweet Potatoes
Soil depth
0.08
0.09
0.17
0.10
0.10
0.15
Soil texture
0.10
0.11
0.16
0.12
0.10
0.15
Soil base saturation
0.09
0.09
0.11
0.09
0.13
0.11
Soil type
0.11
0.15
0.17
0.13
0.13
0.15
Temperature
0.12
0.11
0.07
0.15
0.12
0.09
Precipitation
0.15
0.15
0.06
0.13
0.13
0.07
Elevation
0.11
0.09
0.10
0.13
0.11
0.10
Slope
0.09
0.08
0.09
0.09
0.9
0.08
Agriculture land use
0.15
0.13
0.07
0.06
0.9
0.10
onions, and cabbage. To determine the weight of each criterion online group discussion is carried out. Then, online questionnaires are distributed to agricultural forums. Table 3. Described the final criteria weights for each commodity. The CR value obtained was between 0.007 and 0.08, because CR ≤ 0.1, every calculation made on the criteria and sub-criteria is consistent [19]. Land suitability output from the integration process of thematic layers using the weighted overlay method based on MCDM. The integration process uses nine thematic map layers as input according to selected criteria. This method uses raster data which has the smallest unit in the form of pixels. Each pixel can be assigned a score and weight. Each map layer of criteria and sub-criteria has a different level of importance and range of values. With the weighted overlay method, each map layer has its classification. By classifying each thematic map layer specifically, the calculation results using weighted overlay method are more accurate than other methods [24–26]. The determining the level of importance process is carried out by holding the online discussion with agricultural experts. The output of the overlay process is a land suitability map. There are four land suitability classes, namely: highly suitable (S1), suitable (S2), marginally suitable (S3), and not suitable (N) [5]. Figure 6 presents the suitability map of each commodity. The study area generally has great suitability value for cultivating commodities of onions, cabbage, and potatoes. Based on Fig. 7, 71% of the total agriculture area has a Suitable condition (S2) to cultivate union. 70% of the total agriculture area has Suitable (S2) for cabbage. 64% of the study area also has suitable conditions (S2) to cultivate potatoes. This result means that these three commodities are well suited for planting in most agriculture areas in the Bandung regency. This result is also in line with statistical data from the Bandung Regency Agriculture Office. It shows that the productivity of these tree commodities is better than other Agriculture
Land Suitability Evaluation by Integrating Multi-criteria …
317
Fig. 6 Land suitability map for each commodity; a suitability map of onion; b suitability map of cabbage; c suitability map of potatoes; d suitability map of rice; e suitability map of sweet potatoes; f suitability map of cassava
Fig. 7 The overall percentage of land suitability based on the overlay results
commodities [26]. Figure 7. Presented the overall percentage of land suitability based on the overlay results. This study also designed a GIS application prototype using augmented reality. The aim is to make it easier for farmers to read the land suitability map produced. The marker less AR-GIS is used to present the Land suitability map. This Marker less AR uses a Plane detection algorithm. With this algorithm, farmers can open the map anywhere as long as there is a flat area. This paper has not presented the details of the AR-GIS development process. In further research, it is necessary to improve the prototype by designing better interactions. Farmers must be involved
318
H. Maulana and H. Kanai
Fig. 8 Overview of the AR-GIS suitability map
in the interaction design process. The addition of multimedia content to the map is also necessary. By Adding more valuable content for agriculture, the AR-map will be more informative. Figure 8 provides an overview of the AR-GIS land Suitability map. Based on the visual aspect, the AR map can provide better visualization [18]. ARGIS can visualize the elevation and slope so that the land suitability map is easier to understand. These results are in line with the results of research conducted by Noguera et al. (2012). In that research, it concludes that the 3D map was easier to understand by map readers [27]. This Land suitability can also be used by farmers when selecting Agriculture Commodities. With AR-GIS is hoped the farmers’ understanding of the land suitability map can be increase. So that can reduce the risk of crop failure experienced by farmers.
4 Conclusion This study describes the land evaluation process by integrating the MCDM method with GIS. The MCDM method used is the AHP method. This study uses three main criteria and nine sub-criteria. AHP method generates weights for the thematic Layer. The weight is needed to overlay the thematic Layer using the Weighted overlay method. Based on the overlay results, the area study has the best suitability for cultivating onions, cabbage, and potatoes. This study also designed a GIS application prototype using augmented reality. The aim is to make it easier for farmers to read the land suitability map produced. AR-map provides better visualization. By integrating the MCDM and Weighted Overlay method and also adding a better visual appearance of the AR-GIS, it is hoped that the Land suitability map will be more useful for farmers. It can be one of the critical factors in making decisions regarding the determination of agricultural commodities. This Land suitability can also be used by farmers when selecting Agriculture Commodities. With AR-GIS is hoped the farmers’ understanding of the land suitability map can be increase. So that can reduce the risk of crop failure experienced by farmers.
Land Suitability Evaluation by Integrating Multi-criteria …
319
References 1. Maulana, H., Kanai, H.: Development of precision agriculture models for medium and smallscale agriculture in Indonesia. IOP Conf. Ser. Mater. Sci. Eng. 879 (2020) 2. Maulana, H., Kanai, H.: Multi-criteria decision analysis for determining potential agriculture commodities in Indonesia. J. Eng. Sci. Technol. 15, 33–40 (2020) 3. Grimblatt, V., Ferré, G., Rivet, F., Jego, C., Vergara, N.: Precision agriculture for small to medium size farmers—an IoT approach. Proceedings—IEEE International Symposium Circuits System 2019-May (2019) 4. Herzberg, R., Pham, T.G., Kappas, M., Wyss, D., Tran, C.T.M.: Multi-criteria decision analysis for the land evaluation of potential agricultural land use types in a hilly area of Central Vietnam. Land 8 (2019). 5. Tercan, E., Dereli, M.A.: Development of a land suitability model for citrus cultivation using GIS and multi-criteria assessment techniques in Antalya province of Turkey. Ecol. Indic. 117, 106549 (2020) 6. Ishizaka, A., Siraj, S.: Interactive consistency correction in the analytic hierarchy process to preserve ranks. Decis. Econ. Financ. 43, 443–464 (2020) 7. Amini, S., Rohani, A., Aghkhani, M.H., Abbaspour-Fard, M.H., Asgharipour, M.R.: Assessment of land suitability and agricultural production sustainability using a combined approach (Fuzzy-AHP-GIS): a case study of Mazandaran province. Iran. Inf. Process. Agric. 7, 384–402 (2020) 8. Salas López, R., et al.: Land suitability for Coffee (Coffea arabica) growing in Amazonas, Peru: integrated Use of AHP, GIS and RS. ISPRS Int. J. Geo-Information 9, 673 (2020) 9. Wang, Y.M., Luo, Y.: On rank reversal in decision analysis. Math. Comput. Model. 49, 1221– 1229 (2009) 10. Vázquez-Quintero, G. et al.: GIS-based multicriteria evaluation of land suitability for grasslands conservation in Chihuahua, Mexico. Sustain 12, (2020). 11. Maddahi, Z., Jalalian, A., Masoud, M., Zarkesh, K., Honarjo, N.: Land suitability analysis for rice cultivation using multi criteria evaluation approach and GIS. Eur. J. Exp. Biol. 4, 639–648 (2014) 12. Santoso, P.B.K., Widiatmaka, Sabiham, S., Machfud., Rusastra, I.W.: Land priority determination for paddy field extensification in Subang Regency—Indonesia. IOP Conf. Ser. Earth Environ. Sci. 335 (2019). 13. Ostovari, Y., et al.: GIS and multi-criteria decision-making analysis assessment of land suitability for rapeseed farming in calcareous soils of semi-arid regions. Ecol. Indic. 103, 479–487 (2019) 14. FAO: An overview of land evaluation and land use planning. Food Agric. Organ. 1–19 (1976). 15. Nigam, A., Kabra, P., Doke, P.: Augmented reality in agriculture. Int. Conf. Wirel. Mob. Comput. Netw. Commun., pp. 445–448 (2011). https://doi.org/10.1109/WiMOB.2011.608 5361 16. Carrera, C.C., Asensio, L.A.B.: Landscape interpretation with augmented reality and maps to improve spatial orientation skill. J. Geogr. High. Educ. 41, 119–133 (2017) 17. Hedley, N.R., Billinghurst, M., Postner, L., May, R., Kato, H.: Explorations in the use of augmented reality for geographic visualization. Presence Teleoperators Virtual Environ. 11, 119–133 (2002) 18. Hapsari, H., Rasmikayati, E., Saefudin, B.R.: Karakteristik Petani Dan Profil Usahatani Ubi Jalar Di Kec. Arjasari, Kab. Bandung. Sosiohumaniora 21, 247–255 (2019) 19. Saaty, T.L.: Decision making—the analytic hierarchy and network processes (AHP/ANP). J. Syst. Sci. Syst. Eng. 13, 1–35 (2004) 20. Saaty, R.W.: The analytic hierarchy process-what it is and how it is used. Math. Model. 9, 161–176 (1987) 21. Saaty, T.L.: Decision-making with the AHP: why is the principal eigenvector necessary. Eur. J. Oper. Res. 145, 85–91 (2003)
320
H. Maulana and H. Kanai
22. Saaty, T.L.: how to make a decision: the analytic hierarchy process. Internat. Ser. Oper. Res. Management Sci. 147, 577–591 (2011) 23. Parimala, M., Lopez, D.: Decision making in agriculture based on land suitability—spatial data analysis approach. J. Theor. Appl. Inf. Technol. 46, 17–23 (2012) 24. Yalew, S.G., van Griensven, A., Mul, M.L., van der Zaag, P.: Land suitability analysis for agriculture in the Abbay basin using remote sensing, GIS and AHP techniques. Model. Earth Syst. Environ. 2, 1–14 (2016) 25. Kazemi, H., Akinci, H.: A land use suitability model for rainfed farming by multi-criteria decision-making analysis (MCDA) and geographic information system (GIS). Ecol. Eng. 116, 1–6 (2018) 26. Badan Pusat Statistika: Statistical Yearbook of Indonesia 2020. Stat. Year. Indones. 192 (2020). 27. Noguera, J.M., Barranco, M.J., Segura, R.J., Martínez, L.: A mobile 3D-GIS hybrid recommender system for tourism. Inf. Sci. (Ny) 215, 37–52 (2012)
Toward Reliability in the MCDA Rankings: Comparison of Distance-Based Methods Andrii Shekhovtsov, Jakub Wie˛ckowski, and Jarosław Wa˛tróbski
Abstract The decision-making process is a difficult problem, requiring the decision maker to consider the pros and cons of given options. Systems based on multicriteria methods are increasingly used to support this. However, their number affects the difficulty of determining which of them is the right choice for a given problem and whether they guarantee similar results. In this paper, the three selected multicritieria decision analysis (MCDA) methods were used to provide preferences values for alternatives. Decision matrix was created using different number of alternatives and criteria to check the impact of this change to final results. Three rankings were then subjected to two correlation coefficients. It shows that obtained rankings are highly similar, and greater amount of alternatives and criteria have a positive impact on received similarity. Keywords MCDA · TOPSIS · SPOTIS · COMET
1 Introduction There are many factors involved in making decisions, and this topic accompanies everyday activities [1]. Therefore, an important element in decision making is to have guidelines according to which the decisions will maximize the number of optimal decisions made relative to those who chose inferior options [16, 19]. Choosing optimal solutions from the available alternatives can reduce the number of losses incurred by the decision maker. A. Shekhovtsov · J. Wie˛ckowski Research Team on Intelligent Decision Support Systems, Department of Artificial Intelligence and Applied Mathematics, Faculty of Computer Science and Information Technology, West ˙ Pomeranian University of Technology in Szczecin,ul. Zołnierska 49, Szczecin 71-210, Poland J. Wa˛tróbski (B) ˙ Jana Pawła II 22A, Szczecin Institute of Management, University of Szczecin, Aleja PapieZa 71-210, Poland e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_27
321
322
A. Shekhovtsov et al.
For this purpose, an increasing number of models are being developed to support the expert in making decisions [15, 33]. These models are built using a variety of techniques, with some producing more satisfactory results than others [6]. One approach to pre-designing such systems is to use methods from the multi-criteria decision analysis (MCDA) methods group [20, 29, 34]. Experts in various fields are increasingly using this approach. Multi-criteria decision analysis is widely used in problem solving. This approach is effectively used in solving problems of selecting suppliers of materials [8, 12], choosing the optimal means of transport [27] or as decision support systems [9, 35]. However, the fact that many methods are working on a similar principle can cause difficulties in choosing the right method to solve a given problem [3, 5, 26]. The developed methods are based on different approaches to the performed transformations during the algorithm’s operation, and one of them is the distancebased approach. Methods such as technique for order preference by similarity to an ideal solution (TOPSIS), characteristic objects method (COMET) [13, 14], or stable preference ordering toward ideal solution (SPOTIS) are based on a distance-based approach [4, 10, 11]. It is noted that the results obtained by these methods are similar, but it would be necessary to establish to what extent these results are correlated with each other. In this paper, the three selected MCDA methods were chosen to examine the correlation between obtained rankings. The research was conducted for decision matrices filled with random values, where the size of the matrix was defined for different numbers of criteria and different numbers of alternatives, respectively. For matrices of each size, 1000 runs were then conducted examining the correlation between the resulting rankings. The similarity was measured using the weighted Spearman coefficient and the WS similarity coefficient. The aim was to examine the strength of the correlation of the rankings, taking into account the similarity in performance of the TOPSIS, COMET, and SPOTIS methods. The rest of the paper is organized as follows. In Sect. 2, the MCDA approach’s main assumptions are presented with the preliminaries of three selected methods, namely TOPSIS, COMET, and SPOTIS. Section 3 includes the study case, in which the comparison of correlation between obtained rankings from MCDA methods is made. Finally, in Sect. 4, the results summary and conclusions from the research are presented.
2 MCDA Methods The multi-criteria decision analysis (MCDA) methods are widely used in solving multi-criteria problems [30, 31]. The analysis of compared alternatives carried out by these methods helps choose optimal decisions based on expert knowledge [28, 32]. A group of these methods includes different approaches to solving problems [25, 26]. Below are preliminaries of three selected methods, which are based on distance-based approach.
Toward Reliability in the MCDA Rankings …
323
2.1 TOPSIS Method The technique for order preference by similarity to an ideal solution (TOPSIS) method was developed in 1992 by Chen and Hwang [18]. Authors proposed an approach to examining alternatives assessment based on calculating the distance to the ideal solution. The positive ideal solution (PIS) and negative ideal solution (NIS) are used to evaluate the set of alternatives’ preferences [2, 17]. The method’s application required the decision matrix to be normalized at the first stage to obtain the correct final results. Based on the weights vector defined by the expert, a weighted normalized decision matrix should be calculated using Formula (1). vi j = wi · ri j ,
j = 1, . . . , J ; i = 1, . . . , n
(1)
Positive and negative ideal solutions for a defined decision-making problem should also be identified (2): A∗ = v1∗ , . . . , vn∗ = max j vi j |i ∈ I P , min j vi j |i ∈ I C A− = v1− , . . . , vn− = min j vi j |i ∈ I P , max j vi j |i ∈ I C
(2)
where I C stands for cost type criteria and I P for profit type. Negative and positive distance from an ideal solution should be calculated using the n-dimensional Euclidean distance. To apply such calculations, formula presented below should be used (3):
2 n v − vi∗ , j = 1, . . . , J i=1 i j n − 2 , j = 1, . . . , J D −j = i=1 vi j − vi
D ∗j =
(3)
The last step is to calculate the relative closeness to the ideal solution (4): C ∗j =
D −j ∗ D j +D −j
,
j = 1, . . . , J
(4)
2.2 COMET Method The characteristic objects method (COMET) main assumption is to define the rule base, based on which the final preferences for the alternatives will be calculated [21, 22]. To obtain such a rule base, the definition of the characteristic objects values should be made. Then, the expert examines the pairwise comparison from which, after some calculations, the rule base is defined [23, 24]. To understand how the preferences are calculated, the formal notation of the COMET method is presented below.
324
A. Shekhovtsov et al.
Step 1 Define the Space of the Problem—The expert determines the dimensionality of the problem by selecting the number r of criteria, C1 , C2 , ..., Cr . Then, the set of fuzzy numbers for each criterion Ci is selected (5): Cr = {C˜ r 1 , C˜ r 2 , ..., C˜ r cr }
(5)
where c1 , c2 , ..., cr are numbers of the fuzzy numbers for all criteria. Step 2 Generate Characteristic Objects—The characteristic objects (C O) are obtained by using the Cartesian product of fuzzy numbers cores for each criteria as follows (6): (6) C O = C(C1 ) × C(C2 ) × ... × C(Cr ) Step 3 Rank the Characteristic Objects—The expert determines the matrix of expert judgment (M E J ). It is a result of pairwise comparison of the COs by the problem expert. The M E J matrix contains results of comparing characteristic objects by the expert, where αi j is the result of comparing C Oi and C O j by the expert. The function f exp denotes the mental function of the expert. It depends solely on the knowledge of the expert and can be presented as (7). Afterward, the vertical vector of the summed judgments (S J ) is obtained as follows (8). ⎧ ⎨ 0.0, f exp (C Oi ) < f exp (C O j ) αi j = 0.5, f exp (C Oi ) = f exp (C O j ) ⎩ 1.0, f exp (C Oi ) > f exp (C O j ) SJi =
t
αi j
(7)
(8)
j=1
Finally, values of preference are approximated for each characteristic object. As a result, the vertical vector P is obtained, where i-th row contains the approximate value of preference for C Oi . Step 4 The Rule Base—Each characteristic object and value of preference is converted to a fuzzy rule as follows (9): I F C(C˜ 1i ) AND C(C˜ 2i ) AND ... THEN Pi
(9)
In this way, the complete fuzzy rule base is obtained. Step 5 Inference and Final Ranking—Each alternative is presented as a set of crisp numbers (e.g., Ai = {a1i , a2i , ..., ari }). This set corresponds to criteria C1 , C2 , ..., Cr . Mamdani’s fuzzy inference method is used to compute preference of i-th alternative. The rule base guarantees that the obtained results are unequivocal. The bijection makes the COMET a completely rank reversal free.
Toward Reliability in the MCDA Rankings …
325
2.3 SPOTIS Method The stable preference ordering toward ideal solution (SPOTIS) method is a new approach to solve the multi-criteria decision problems [7]. Authors aim to create a method which would be free of the rank reversal phenomenon. As a COMET method, SPOTIS is fully resistant to this, which means, that different number of alternatives will not affect the final preferences. The main assumption of this method is to the define the data boundaries to define ideal solution point (ISP), which with further calculations provides the final preferences for alternatives. and The definition of the data boundaries requires to select the maximum S max j ∗ bound for each criterion C . Ideal solution point S is defined as minimum S min j j j ∗ max ∗ min S j = S j for profit and as S j = S j for cost type of criterion. More necessary transformations during the method application is presented below. Step 1 Calculation of the normalized distances to ideal solution point (10). di j (Ai , S ∗j ) =
|Si j − S ∗j | |S max − S min j j |
(10)
Step 2 Calculation of weighted normalized distances d(Ai , S ∗ ) ∈ [0, 1], according to (11). ∗
d(Ai , S ) =
N
w j di j (Ai , S ∗j )
(11)
j=1
Step 3 Final ranking should be determined based on d(Ai , S ∗ ) values. Smaller values d(Ai , S ∗ ) which are preferences of alternatives result in better position in general ranking.
3 Study Case In this paper, the three chosen MCDA methods were taken to examine the correlation between obtained rankings. The TOPSIS, COMET, and SPOTIS method were used to provide the preference assessment for the set of alternatives. The research includes four different amount of criteria [2–5] and four different amount of alternatives [5, 10, 15, 30]. All possibilities were combined, and it results in receiving 16 different sizes of the decision matrix. For all of them, the 1000 runs of methods application were made. Obtained results were then compared with two correlation coefficients, namely the weighted Spearman correlation coefficient and WS similarity coefficient. The results obtained for the decision matrix considering five alternatives with a different number of criteria are presented in Figure 1. The values obtained from the correlation coefficients for the received rankings were presented in graphs, comparing
326
A. Shekhovtsov et al.
Fig. 1 Five alternatives
Fig. 2 Ten alternatives
the preference values for TOPSIS/SPOTIS, TOPSIS/COMET, and SPOTIS/COMET methods. In turn, Fig. 2 shows the correlation results between the rankings containing ten alternatives. Each graph compares the rankings’ correlation separately for the weighted Spearman coefficient and the WS similarity coefficient. Besides, the charts were sorted in ascending order of the number of criteria in the decision matrix. The subsequent test cases are shown in Figs. 3 and 4, where the results for 15 and 30 alternatives in the test set are included, respectively. It is worth noting that in the case of analyzing a smaller number of alternatives in the set, the slightest fluctuation of the obtained correlations was guaranteed by comparing TOPSIS/SPOTIS rankings, where the values oscillated mostly between 0.8 and 1.0, which indicates a strong correlation of the obtained results. The greatest fluctuations in the similarities were obtained for the SPOTIS/COMET methods. A large part of the obtained rankings guaranteed similarity in the range of 0.6–0.8, especially when analyzing matrices containing only five alternatives. In case of analyzing correlations considering the greater amount of alternatives, it can be seen, that a vast amount of compared rankings have a significant similarity, which places between 0.9 and 1.0 in most cases. The greater amount of alternatives
Toward Reliability in the MCDA Rankings …
327
Fig. 3 15 alternatives
Fig. 4 30 alternatives
and criteria were taken into consideration, and the higher correlation was received. Besides, all compared methods were giving similar results to a great extent.
4 Conclusions Decision making is an important aspect of life, requiring the decision-maker to choose decisions based on their beliefs. However, these decisions do not always bring benefits, despite the desire to maximize gains and minimize losses. To assist the decision-making process, various methods are used to support the expert. However, the problem may be which methods are the right one to use in the given problem, as their number is constantly growing. To determine the quality of the performance of the methods, it was decided to use three selected MCDA methods, namely TOPSIS, COMET, and SPOTIS, to analyze the similarity of the obtained rankings. The study decided to introduce different sizes of decision matrices, taking into account the changing number of alternatives and criteria. The rankings obtained were compared using two similarity coefficients,
328
A. Shekhovtsov et al.
namely the weighted Spearman correlation coefficient and the WS similarity coefficient. The research has shown that a higher number of alternatives and criteria has a positive effect on the correlation between rankings, giving at the same time high values describing these correlations. For future directions, it is worth considering to provide such a comparison with taken into account different methods belonging to the MCDA methods. Moreover, the number of criteria and alternatives can be the extent to a more significant number of numbers included, which will provide a more comprehensive look for received results and help assess the quality of methods performance. Acknowledgements The work was supported by the project financed within the framework of the program of the Minister of Science and Higher Education under the name “Regional Excellence Initiative” in the years 2019–2022, Project Number 001/RID/2018/19;the amount of financing: PLN 10.684.000,00 (J.W.).
References 1. Bazerman, M.H., Moore, D.A.: Judgment in Managerial Decision Making. Wiley & Sons, New York (2012) 2. Behzadian, M., Otaghsara, S.K., Yazdani, M., Ignatius, J.: A state-of the-art survey of TOPSIS applications. Expert Syst. Appl. 39(17), 13051–13069 (2012) 3. Brasil Filho, A.T., Pinheiro, P.R., Coelho, A.L., Costa, N.C.: Comparison of two MCDA classification methods over the diagnosis of Alzheimer’ disease. In: International Conference on Rough Sets and Knowledge Technology, pp. 334–341. Springer, Berlin, Heidelberg (2009) 4. Chen, T.Y.: A signed-distance-based approach to importance assessment and multi-criteria group decision analysis based on interval type-2 fuzzy set. Knowl. Inf. Syst. 35(1), 193–231 (2013) 5. Dehe, B., Bamford, D.: Development, test and comparison of two Multiple Criteria Decision Analysis (MCDA) models: A case of healthcare infrastructure location. Expert Syst. Appl. 42(19), 6717–6727 (2015) 6. De Montis, A., De Toro, P., Droste-Franke, B., Omann, I., Stagl, S.: Assessing the quality of different MCDA methods. Altern. Environ. Valuation 4, 99–133 (2004) 7. Dezert, J., Tchamova, A., Han, D., Tacnet, J.M.: The SPOTIS rank reversal free method for multicriteria decision-making support. In: 2020 IEEE 23rd International Conference on Information Fusion (FUSION), pp. 1–8. IEEE (2020) 8. Dulmin, R., Mininno, V.: Supplier selection using a multi-criteria decision aid method. J. Purch. Supply Manag. 9(4), 177–187 (2003) 9. Giove, S., Brancia, A., Satterstrom, F.K., Linkov, I.: Decision support systems and environment: role of MCDA. In: Decision Support Systems for Risk-Based Management of Contaminated Sites, pp. 1–21. Springer, Boston (2009) 10. Hyde, K.M., Maier, H.R., Colby, C.B.: A distance-based uncertainty analysis approach to multi-criteria decision analysis for water resource decision making. J. Environ. Manage. 77(4), 278–290 (2005) 11. Hyde, K.M., Maier, H.R.: Distance-based and stochastic uncertainty analysis for multi-criteria decision analysis in excel using visual basic for applications. Environ. Modell. Softw. 21(12), 1695–1710 (2006) 12. Karczmarczyk, A., Wa˛tróbski, J., Ladorucki, G., Jankowski, J.: MCDA-based approach to sustainable supplier selection. In: 2018 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 769–778. IEEE. (2018)
Toward Reliability in the MCDA Rankings …
329
13. Kizielewicz, B., Sałabun, W.: A new approach to identifying a multi-criteria decision model based on stochastic optimization techniques. Symmetry 12(9), 1551 (2020) 14. Kizielewicz, B., Kołodziejczyk, J.: Effects of the selection of characteristic values on the accuracy of results in the COMET method. Procedia Comput. Sci. 176, 3581–3590 (2020) 15. Kizielewicz, B., Wa˛tróbski, J., Sałabun, W.: Identification of relevant criteria set in the MCDA process-wind farm location case study. Energies 13(24), 6548 (2020) 16. Klein, G.: Naturalistic decision making. Hum. Factors 50(3), 456–460 (2008) 17. Mairiza, D., Zowghi, D., Gervasi, V.: Utilizing TOPSIS: a multi criteria decision analysis technique for non-functional requirements conflicts. In: Requirements Engineering, pp. 31–44. Springer, Berlin, Heidelberg (2014) 18. Opricovic, S., Tzeng, G.H.: Compromise solution by MCDM methods: a comparative analysis of VIKOR and TOPSIS. Eur. J. Oper. Res. 156(2), 445–455 (2004) 19. Saaty, T.L.: Decision making with the analytic hierarchy process. Int. J. Serv. Sci. 1(1), 83–98 (2008) 20. Sałabun, W.: How the normalization of the decision matrix influences the results in the VIKOR method? Procedia Comput. Sci. 176, 2222–2231 (2020) 21. Sałabun, W.: The characteristic objects method: a new distance-based approach to multicriteria decision-making problems. J. Multi-Criteria Decis. Anal. 22(1–2), 37–50 (2015) 22. Sałabun, W., Palczewski, K., Wa˛tróbski, J.: Multicriteria approach to sustainable transport evaluation under incomplete knowledge: electric bikes case study. Sustainability 11(12), 3314 (2019) 23. Sałabun, W., Piegat, A.: Comparative analysis of MCDM methods for the assessment of mortality in patients with acute coronary syndrome. Artif. Intell. Rev. 48(4), 557–571 (2017) 24. Sałabun, W., Ziemba, P., Wa˛tróbski, J.: The rank reversals paradox in management decisions: The comparison of the ahp and comet methods. In: International Conference on Intelligent Decision Technologies, pp. 181–191. Springer, Cham (2016) 25. Sałabun, W., Wa˛tróbski, J., Shekhovtsov, A.: Are MCDA methods benchmarkable? A comparative study of TOPSIS, VIKOR, COPRAS, and PROMETHEE II methods. Symmetry 12(9), 1549 (2020) 26. Shekhovtsov, A., Sałabun, W.: A comparative case study of the VIKOR and TOPSIS rankings similarity. Procedia Comput. Sci. 176, 3730–3740 (2020) 27. Shekhovtsov, A., Kozlov, V., Nosov, V., Sałabun, W.: Efficiency of methods for determining the relevance of criteria in sustainable transport problems: a comparative case study. Sustainability 12(19), 7915 (2020) 28. Stewart, T.J.: Dealing with uncertainties in MCDA. In: Multiple Criteria Decision Analysis: State of the Art Surveys, pp. 445–466. Springer, New York (2005) 29. Wa˛tróbski, J., Sałabun, W., Ladorucki, G.: The temporal supplier evaluation model based on multicriteria decision analysis methods. In: Asian Conference on Intelligent Information and Database Systems, pp. 432–442. Springer, Cham (2017) 30. Wa˛tróbski, J., Jankowski, J.: Knowledge management in MCDA domain. In: 2015 Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 1445–1450. IEEE (2015) 31. Wa˛tróbski, J., Jankowski, J., Ziemba, P.: Multistage performance modelling in digital marketing management. Econ. Sociol. 9(2), 101 (2016) 32. Wa˛tróbski, J., Sałabun, W.: Green supplier selection framework based on multi-criteria decision-analysis approach. In: International Conference on Sustainable Design and Manufacturing, pp. 361–371. Springer, Cham (2016) 33. Wa˛tróbski, J., Jankowski, J.: Guideline for MCDA method selection in production management area. In: New Frontiers in Information and Production Systems Modelling and Analysis, pp. 119–138. Springer, Cham (2016) 34. Wa˛tróbski, J., Jankowski, J., Ziemba, P., Karczmarczyk, A., Zioło, M.: Generalised framework for multi-criteria method selection. Omega 86, 107–124 (2019) 35. Wie˛ckowski, J., Kołodziejczyk, J.: Swimming progression evaluation by assessment model based on the COMET method. Procedia Comput. Sci. 176, 3514–3523 (2020)
Knowledge Engineering in Large-Scale Systems
Affection of Java Design Patterns to Cohesion Metrics Sergey Zykov, Dmitry Alexandrov, Maqsudjon Ismoilov, Anton Savachenko, and Artem Kozlov
Abstract The idea of code quality assessment is well known for a long time; class connectivity metrics were proposed by community several years ago and have not become generally applicable practice in industrial programming. The objective of the study, part of which we present in this paper, is to critically analyze the metrics available for today: Are they completely unusable, or considering their specifics can be useful helpers for software specialists. Specifically, we try to answer the questions: Are there any connection between design patterns and cohesion metrics, and how these patterns affect metrics if they do. Keywords Computing methodologies · Class cohesion · Coupling · Code design patterns · Cohesion metrics
1 Introduction Class cohesion is a degree of connection between class methods, and often critical to quality assurance (QA) of object-oriented software. Typically, a low cohesive class contains disparate and/or isolated members; therefore, cohesion level is useful for detecting of poorly designed classes and ensuring faster and better system reconfiguration. There are over thirty different metrics to measure cohesion that based on class member analysis in terms of number and structure of attributes and methods. Utilizing class cohesion metrics can promote Java code static analysis quality, improve objectoriented programming practices and suggest advanced and efficient ones. jPeek (Huawei Technologies) as software tool with a library for inspecting objects in Java was built to reach these goals.
S. Zykov (B) · D. Alexandrov · M. Ismoilov · A. Savachenko · A. Kozlov National Research University Higher School of Economics, Moscow, Russian Federation D. Alexandrov Federal State Budget Educational Institution of Higher Education, «MIREA—Russian Technological University», Moscow, Russian Federation © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_28
333
334
S. Zykov et al.
2 Theoretical Overview 2.1 jPeek Features Summary There exist over 14 different cohesion metrics; however, a few of them have available calculator. Therefore, the purpose of jPeek is to analyze the code quality using multiple metrics. There are 14 cohesion metrics that implemented in jPeek currently: LCOM, LCOM2, LCOM3, LCOM4, LCOM5, LCC, CCM, CAMC, MMAC, NHD, OCC, PCC, SCOM and TCC. They will be described further, in this paper.
2.2 Cohesion Metrics In this section, the cohesion metrics that were implemented in jPeek will be described, as the results of further analysis based on them. Lack of Cohesion in Methods (LCOM) LCOM, proposed by Chidamber and Kemerer, provides a relative measure of disparate methods in a class [1]. LCOM uses the degree for two methods in a class, which is determined by the intervention of the set of instance variables used by each method. The LCOM value is the difference between the number of method pairs, whose similarity is zero, and the number of method pairs, whose similarity is nonzero. LCOM2 LCOM2, proposed by Chidamber and Kemerer, equals the percentage of methods that do not access a specific attribute averaged over all attributes in the class [2]. If the number of methods or attributes is zero, LCOM2 is undefined and displayed as zero. LCOM3 LCOM3, proposed by Li and Henry, varies between 0 and 2 [3]. In a normal class, whose methods access its own variables, LCOM3 varies between 0 (high cohesion) and 1 (no cohesion). When LCOM3 = 0, each method accesses all variables that indicates the highest possible cohesion. LCOM3 = 1 indicates an extreme lack of cohesion. When there exist variables that are not accessed by any of the class methods, 1 < LCOM3 . . . < Ri−1 , Ri , ai >,
(2)
Applicative-Frame Model of Medical Knowledge Representation
345
where X R1 , …, Ri a1 , …, ai Ri
is a set of initial data; sets of intermediate solutions (generalized indicators of the control object); elementary information processing (generalization) operations; is the final solution;
R = {R1 , …, Ri }; A = {a1 , …, ai };
i
elementary n-operation of information generalization (n = 1, …, i); is the number of elementary operations of generalizing information to develop a solution (depth of the search for a solution in the traditional sense).
Note that each set of intermediate solutions can be considered as a set of initial data for the next stage of information generalization, i.e., from the substantive point of view, R is the patient’s state, at a fixed level of generalization of information, and elementary operations determine the rules for the transition from the lower to the upper level of generalization of information about the state of health. Expression (2) can be represented as a set of information processing functions, each of which maps the corresponding subset of the set of initial data X into one solution of the set R. Each such function can be represented as: Rm = f m (x1 , . . . , x N ),
(3)
where ym is an element of the set Y, m = 1, …, I, I N fm
is the number of elementary information processing operations; is the amount of initial data; is a function corresponding to the elementary information processing operation am . In this notation, the decision-making process can be represented as a formula: M = f i ( f i−1 (. . . f 1 (x1 , . . . x N ) . . .)),
(4)
Thus, the model of the process of developing a medical decision (1) can be represented as a graph of information processing functions, where the vertices of the graph are names (descriptions of functions), and arcs are the directions of information transitions between functions. Such a model can be called an applicative model of the domain (decision-making process).
346
G. S. Lebedev et al.
3 Construction of an Applicative-Frame Model of Knowledge Representation From a meaningful point of view, the applicative function of information processing can be interpreted as one of the tasks (subtasks) into which the original problem (the problem of developing a solution) can be decomposed. In this interpretation, we can assume that the problem of developing a medical solution can be represented as a division of the original problem into subtasks. The task is described in the form of a graph, called the task reduction graph. In this case, the task will correspond to the vertices, and the task reduction operators will correspond to the arcs. Moreover, the root of the tree corresponds to the original problem, to the vertices of the first level—the tasks directly generated by the original problem, to the vertices of the second level—the tasks generated by the tasks of the first level, etc. In addition, the solution to the problem corresponds to the value of the information processing function, and the initial data of the problem correspond to the parameters of the information processing function. If for the solution of a higher-level problem, it is necessary to solve all subtasks arising from it, then the nodes of the graph corresponding to these subtasks are called connected; otherwise, the nodes are said to be disconnected. The vertices of the graph corresponding to the problems that can be solved are called solvable, not having a solution—unsolvable. The vertices corresponding to elementary problems at the lowest level of the partition are called final. Therefore, a vertex that is not final will be resolvable if all its child connected vertices or at least one of its unconnected child vertices are resolvable. The search for a solution when reducing a problem to subtasks is based on the task reduction graph, which is an AND-OR graph. To describe formal procedures for finding a solution, it is necessary to define the formalism of the implicit description of the graph. The traditional approach to implicit graph representation is by describing operators for generating subtasks [5]. The operator converts the original task description into a set of child subtasks. Many transformation operators can exist for a particular task. The application of each operator generates an alternative set of subproblems, which determines the existence of an OR relation on the reduction graph. The paper proposes an approach to representing a subtask tree in the form of a data flow graph (DFG). DFG [5–11] is understood as a formalized representation of the information processing process in order to solve a problem in the form of a graph, the vertices of which are information processing functions (generalization rules), and the arcs are information flows, passing through these functions. Let’s note the main features characterizing the DFG: • DFG visually and adequately reflects the information processing process when deriving a solution; • DFG reflects the parallelism and sequence of execution of individual information processing functions;
Applicative-Frame Model of Medical Knowledge Representation
347
• Each function is triggered when data is present at its input, and therefore, the data processing is continuous and the DFG reflects real-time processing of the data. Consider the formal apparatus that can be proposed to describe the DFG, which reflects the decision-making process. Obviously, such an apparatus should be an algorithmic theory, such as normal Markov algorithms, post machine, turing machine, ň-calculus or the theory of combinators. The equivalence of these systems was proved. It seems to be the most successful to use the apparatus of the theory of combinators [12, 13]. Since the theory of combinators is a derivative of the theory of ň-calculus, it has the advantages inherent in the ň-calculus, as well as specific unique properties. The advantages of the theory of combinators are convincingly proved, and practical results are described. Let’s show the main ones: • The theory of combinators is an algorithmic system and allows one to formally describe all computable functions; • The theory of combinators is a typeless theory, all variables, constants and functions are treated as symbols; • All combinators are expressible in the minimal basic set of combinators. Let an alphabet be given, consisting of brackets (,), the symbol and the letters of the Latin alphabet: A, B, …, a, b, …. Then we define objects by induction: if a is a letter, then a is an elementary object; every elementary object is an object; if a and b are objects, then (ab) is an object. Let’s accept the convention that the objects ab, a (bc) and abcd are shorthand for the objects (ab), (a (bc)), (((ab) c) d). In this notation, by a combinator we mean objects, each of which is associated with a characteristic function of changing the combinations of variables to which this combinator is applied. Note that a combinator is an object that does not include any variable. That is, if the combinatory X is successively applied to a number of objects a, …, ai , then the emerging new object X a , …, ai is converted (transformed) into an object b, which is a new combination of objects a, …, ai . This is written as: X a . . . ai → b. A basic set of combinators is a set of combinators through which all other combinators are deducible, but which are not deducible from each other. Such a set is the following: {K, S}, where the combinators have the following definitions: K ab → a(chancellor), Sabc → ac(bc)(connector). In addition, the following combinators are useful:
348
G. S. Lebedev et al.
I a → a(identity), Babc → a(bc)(composer), Cabc → acb(permutator), W ab → abb(duplicator). After the introduced definitions, we will demonstrate representations of the GPD in multidimensional combinators Bkn , Ini , K n [4], which have the following combinatorial definitions: (C1)Bkn f g1 . . . gk x1 . . . xn = f (g1 (x1 , . . . , xn ) . . . gk (x1 , . . . , xn )), (C2) Ini , x1 . . . xi . . . xn = xi , (C3)K n x y1 . . . yn = x. In the notation of these combinators, each information processing function (3) can be written as: y = Bkn f I1 . . . Ik x1 . . . xn ,
(5)
a the whole model (4) in the form: y = Bkn f Y1 . . . Yk x1 . . . xn
(6)
where f Y 1 , …, Y k X 1 …, x n k n
is the mapping function of the root vertex; combinatorial expressions of subordinate tree nodes; lowest level raw data; the number of downstream nodes; the amount of initial data.
Thus, the subtask tree is written in the form of a single combinatorial line, the representation of which in the information system does not cause any particular difficulties. The reduction of a combinatorial expression is carried out according to the rules C1–C3, which requires the use of only three inference procedures for the inference of the solution, as well as procedures that implement the mapping functions. The advantage of this approach is the compactness of the stored definitions and the possibility of parallel reduction. Interesting ways to implement data flow machines and other promising architectures are outlined in [5, 14]. Let us show the connection between multidimensional combinators and the previously introduced basic set of combinators {K, S} [12]: I = I11 ; K = K1; B = B11 ;
Applicative-Frame Model of Medical Knowledge Representation
349
K n = B K K n−1 ; 1 Inm = K m−1 In−m+1 = K m−1 K n−m ;
Bnm = Fm(n) = B Fm Fm(n−1) . where In1 = K n−1 ; Inn = K n−1 I ; K0 = I ; Fm = B(B Sm−1 )B; Sm = B(B Sm−1 )S. To represent knowledge about medical decision making in the knowledge base of IMDSS, it is necessary to develop such a knowledge representation model (KRM), which allows to formally describe the task reduction graph (subtask tree) written in combinatorial notation. Moreover, it is necessary to provide for two main conditions: • KRM should support the construction of network structures to represent graph models of knowledge. • KRM knowledge should provide the output of the value when reducing each of the vertices of the graph. Obviously, the required model must combine the properties of frame models or semantic networks to represent tree structures and production or logical models to provide the output of the values of the tree vertices. The paper proposes an approach that combines the properties of frame and production models, focused on the representation of knowledge about the processes of making medical decisions [14–18]. To represent constructions of the form (3) in a form convenient for implementation in the knowledge base, it is proposed to use a formal construction—an “applicative frame” (AF), which can be set in accordance with the traditional notation in the form: N := (G, A1 , . . . , A M , F, E),
(7)
where N G A1 , …, AL F
E L
is the name of the AF, target AF attribute, AF arguments, AF mapping function (mapping the values of the frame arguments to the value of the target attribute), i.e., AF provides application of the display function to the frame arguments, function of explaining the conclusion of the decision, is the number of AF arguments.
350
G. S. Lebedev et al.
The connection between constructions (3) and (7) is obvious. The target AF attribute corresponds to the value of the information processing function, the AF arguments correspond to its arguments, and the AF display function corresponds to its description. The AF tree corresponds to the applicative domain model, the vertices of which are the names of the AF, and the branches are the frame arguments and (or) target attributes. The target attribute and arguments are defined within their range of valid values. The display function is specified by a list of display operators, each of which consists of two main parts: a list of conditions and an action. The condition is described in the form of predicates on subsets of the AF arguments, and the action is described in the form of various feasible operations on arguments, operations of accessing subroutines and databases. The described MPD should be expanded by means of dialogue, meta-tools for reduction control and means of explaining decisions and implemented in a set of tools for building intelligent decision support systems.
4 Knowledge Representation Model Implementation The conclusion of the solution for the applicative-frame KRM should be carried out at two levels: • Reduction of the entire AF tree. • Reduction of one AF. The reduction of the AF tree is carried out by successive reduction of its vertices. The reduction is carried out by the recursive procedure RED, given by the following formal rules: (R1) RED[Ni ] −− > RED[(G N i , A1i , . . . , A Mi , Fi )]. (R2) RED[(G i , A1i , . . . , A Mi , Fi ) −− > EVAL[Fi , RED(A1i ), . . . , RED(A Mi )]. (R3) RED[Ai ] −− > RED[Ni ], Ai = G N i , (R4) RED[Ai ] −− > #NR, Ai #(G 1i , . . . , G N i ), where Ni Gi Ami Fi #NR
i-AF name; i-AF target attribute name; name of m-argument i-AF; i-AF display function name; terminal top special meaning;
i = 1, …, I; mi = 1, …, M i ;
Applicative-Frame Model of Medical Knowledge Representation
I Mi
the number of AFs allocated in the subject area; number of arguments for i-AF.
I Mi
the number of AFs allocated in the subject area; number of arguments for i-AF.
351
Rules (R1–R4) do the following: (R1)—expansion rule; executed if the procedure argument is an AF name and searches for full AF notation; (R2)—the rule of reduction of the AF tree vertex is executed if the procedure argument is the construction of the full AF notation and performs recursive application of the procedure to the AF arguments, followed by the EVAL function that calculates the value of the procedure; (R3)—rule of transition to the next node of the AF tree is executed if the function argument is the name of the AF argument, provided that an AF for which this argument is the target attribute is found and recursively applies the procedure to the AF name containing this target attribute; (R4)—the rule for determining the terminal vertex, it is executed if the argument of the procedure is the name of the AF argument, and this argument is not a target attribute of any other AF, and assigns the special value #NR (no references) to the procedure; in this case, the reduction process ends. The RED procedure value is either the EVAL procedure value or the #NR special value. The algorithm for calculating the value of the EVAL reduction procedure is formally specified by the following rules: (E1) EVAL[Fi , VG1i , . . . , VG Mi ] −− > APPLAY[Fi , VG1i , . . . , VG Mi ]. (E2) EVAL Fi , VGi1 , . . . , VGim−1 , #NR, VGim+2 , . . . , VGi M −− > APPLAY Fi , VGi1 , . . . , VGim−1 , Am , VGim+2 , . . . , VGi M , where VGi #NR
is the value of the target attribute i-AF, which is the m-argument of i-AF; Am is the name of the m-argument of the i-AF; is the special value of the terminal top.
Rule (E1) determines the composition of the arguments and initializes the APPLAY procedure if no terminal symbols #NR are found among the procedure arguments. Rule (E2) is executed otherwise, and instead of the terminal symbol, it passes the value of the corresponding AF argument to the APPLAY procedure. The APPLAY procedure is a traditional procedure for interpreters of functional programming languages and applies the function that is its first argument to its subsequent arguments. The implementation of the APPLAY function, which should directly calculate the frame value, is the most difficult. This procedure actually performs inference on the target value of the attribute. The complexity of this procedure is determined by the complexity of defining the display function.
352
G. S. Lebedev et al.
Reduction of the target AF is carried out by the formal recursive procedure RED, specified by the rules R1–R4, the value of which is the value of the target attribute of this AF. The beginning of the root AF reduction is specified by the R1 rule. The work of the reduction procedure consists in determining the values of the AF arguments and applying its first suitable defining expression from the description of the mapping function of this AF to the values of the arguments. The recursiveness of the reduction procedure allows us to speak about the reduction of the entire AF tree. Therefore, the process of deriving a solution can be identified with the process of reduction of the AF tree. In order to impart the properties of an AND-OR reduction graph to the KRM, it is necessary to introduce meta-rules into the AF description, which are initialized at the beginning of the AF reduction. Each such meta rule must perform an alternate pass through the AF tree. This can be achieved by dynamically modifying the AF argument list: either adding new members to it, or excluding existing ones.
5 Conclusion The proposed KRM can form the basis for building a knowledge system about the conclusion of a medical or management decision. The implementation of such KRM can be performed in various programming systems, and storage of formalized structures can be carried out in various database and knowledge management systems. In the proposed knowledge model, the methodology of artificial neural networks can be implemented, but also other approaches associated with parallel inference of solution.
References 1. Xu, J., Xue, K., Zhang, K.: Current status and future trends of clinical diagnoses via image-based deep learning. Theranostics 9(25) (2019) 2. Shahid, N., Rappon, T., Berta, W.: Applications of artificial neural networks in health care organizational decision-making: a scoping review. PLoSONE 14(2), e0212356 (2019). https:// doi.org/10.1371/journal.pone.0212356 3. Jovanovi´c, M., Milenkovi´c, D., Perkovi´c, M, Milenkovi´c, T., Nikovi´c, V.: The use of artificial neural networks in clinical medicine. In: Sinteza 2016—International Scientific Conference on ICT and E-Business Related Research, pp. 112–117. Belgrade, Singidunum University, Serbia (2016). https://doi.org/10.15308/Sinteza-2016-112-117 4. Published Standards by ISO/TC215 “Health Informatics”. https://www.iso.org/committee/ 54960/x/catalogue/p/0/u/1/w/0/d/0 5. Abdali, S.K.: An abstraction algorithm for combinatory logic. J. Symbol. Logic 41, 222–224 (1976) 6. Carkci, M.: Dataflow and Reactive Programming Systems: A Practical Guide, 570 p. CreateSpace Independent Publishing Platform (2014). ISBN 9781497422445
Applicative-Frame Model of Medical Knowledge Representation
353
7. Kent, A.: Dataflow languages. In: Encyclopedia of Library and Information Science: vol. 66—Supplement 29—Automated System for the Generation of Document Indexes to Volume Visualization, pp. 101–500. Taylor & Francis (2000). ISBN 9780824720667 8. Maurer, P.M., Oldehoeft, A.E.: The use of combinators in translating a purely functional language to low-level data flow graphs. Comp. Lang. 8(1), 27–45, 4.56 (1983) 9. Sharp, J.A.: Data Flow Computing: Theory and Practice, 566 p. Intellect, Limited (1992). ISBN 9780893919214 10. Van-Roy, P., Haridi, S.: Concepts, Techniques, and Models of Computer Programming, 900 p. Prentice-Hall (2004). ISBN 9780262220699 11. Johnston, W.M., Paul Hanna, J.R., Millar, R.J.: Advances in dataflow programming languages. ACM Comput. Surv. 36(1), 1–34 (2004) 12. Church, R.: The Calculi of Lambda-Conversion. Princeton (1941) 13. Curry, H.B., Feys, R., Craig, W.: Combinatory Logic, vol. 1. North-Holland, Amsterdam (1958) 14. Shortliffe, E.H.: Computer-Based Medical Consultations: MYCIN. Elsevier/North Holland, New York NY (1976) 15. Robinson, D.F., Foulds, L.R.: Comparison of phylogenetic trees. Math. Biosci. 53(1–2), 131– 147 (1981). https://doi.org/10.1016/0025-5564(81)90043-2 16. Barnett, G.O., Cimino, J.J., Hupp, J.A., Hoffer, E.P.: DXplain—An evolving diagnostic decision-support system. JAMA 258, 67–74 (1987) 17. Berner, E.S. (ed.): D. Clinical Decision Support Systems: State of the Art. AHRQ Publication No. 09-0069-EF, June 2009. https://healthit.ahrq.gov/sites/default/files/docs/page/090069-EF_1.pdf 18. Osheroff, J.A., Teich, J.M., Middleton, B.F., et al.: A roadmap for national action on clinical decision support. J. Am. Med. Inform. Assoc. 14(2), 141–145 (2007). https://doi.org/10.1197/ jamia.M2334. Available at http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2213467/
Eolang: Toward a New Java-Based Object-Oriented Programming Language Hadi Saleh , Sergey Zykov , and Alexander Legalov
Abstract Object-oriented programming (OOP) is one of the most common programming paradigms used for building software systems. However, despite its industrial and academic value, OOP is criticized for its high complexity, low maintainability and lack of rigorous principles. Eolang (a.k.a. EO) was created to solve the above problems by restricting its features and suggesting a formal object calculus for this programming language. This paper seeks to analyze the Eolang language and compare it to other OOP languages in order to develop the core features of this new language. Keywords Eolang · Elegant objects · Object-oriented programming · Functional programming · Java virtual machine
1 Introduction Object-oriented programming (OOP) has been the dominant paradigm in the software development industry over the past decades. OOP languages, including well-known languages such as Java, C++, C# and Python, are widely used by major technology companies, software developers and leading providers of digital products and solutions for various projects [1]. It should be noted that virtually all key programming languages are essentially focused on supporting multi-paradigm style, which allows for different styles of coding in a single software project. The absence of restrictions on programming style often leads to the use of not the most reliable coding techniques, which greatly affects the reliability of programs in several areas. The existing attempts to limit the programming style, by directives, do not always lead to the desired result. In addition, supporting different programming paradigms complicates languages and tools, reducing their reliability. Moreover, the versatility of these tools is not always required everywhere. Often, many programs can be developed using only the OOP paradigm [2].
H. Saleh (B) · S. Zykov · A. Legalov National Research University Higher School of Economics, Moscow, Russian Federation © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_30
355
356
H. Saleh et al.
Furthermore, among language designs considered OOP, there are those that reduce the reliability of the code being developed. Therefore, the actual problem is the development of such OOP languages that provide higher reliability of programs. This is especially true for a number of critical areas of their application. A lot of teams and companies that use these languages suffer from the lack of quality of their projects despite the tremendous effort and resources that have been invested in their development. Many discussions concerning code quality issues appeared in the field. Mainly, these focused on eliminating code smells and introducing best practices and design patterns into the process of software development. As many industry experts point out, the reason for project quality and maintainability issues might be explained by the essence of inherent flaws in the design of the programming language and the OOP paradigm itself, and not the incompetence or lack of proper care and attention of the developers involved in the coding solely [3]. Thus, it is necessary to develop new programming languages and approaches for implementing solutions in the OOP paradigm are to be developed. Some programming languages emerged based on the Java Virtual Machine to address this claim and solve the design weaknesses of Java for the sake of better quality of produced solutions based on them. These are Groovy, Scala and Kotlin, to name a few [4]. While many ideas these languages proposed were widely adopted by the community of developers, which led to their incorporation into the mainstream languages, some were considered rather impractical and idealistic. Nevertheless, such enthusiastic initiatives drive the whole OOP community toward better and simpler coding. EO (stands for Elegant Objects or ISO 639-1 code of Esperanto) is an objectoriented programming language. It is still a prototype and is the future of OOP [5]. EO is one of the promising technologies that arise to drive the course of elaboration of the proper practical application of the OOP paradigm. The EO philosophy advocates the concept of so-called Elegant Objects, which is the vision of pure that is free from the incorrectly taken design decisions common for the mainstream technologies. Specifically, these are: static methods and attributes; classes; implementation inheritance; mutable objects; NULL references; global variables and methods; reflection and annotations; typecasting; scalar data types; and flow control operators (for loop, while loop, etc.).
2 Research Problem and Objective The fundamental problem in OOP is the lack of a rigorous formal model, the high complexity, the too many ad hoc designs and the fact that programmers are not satisfied. Many OOP languages, including Java, have critical design flaws. Due to these fundamental issues, many Java-based software products are low quality and hard to maintain. The above drawbacks often result in system failures, customer complaints and lost profits. The problem has been recognized; however, it has not been addressed yet.
Eolang: Toward a New Java-Based Object-Oriented Programming …
357
In addition, OOP styles were considered to ensure the formation of effective compositions of software products. Among the formal approaches is the work of the theoretical models describing the theoretical OOPs. These include Abadi’s work on ς-calculus (sigma-calculus), which can be used to reduce the semantics of any object-oriented programming language to four elements: objects, methods, fields and types. Further development of these works led to the creation of a ρ-calculus used in the description of elementary design patterns [2]. This work aims to provide an overview of Eolang, to check capability and functionality of the main aspects, to assess and understand the features through the prism of comparing Eolang with other OOP languages (such as Java, Groovy and Kotlin), with examples for simple use-cases, and to publish the R&D results. The comparative analysis will focus on, between these languages, the OOP principles, data types, operators and expressions as well as declarative and executable statements. We want to stay as close to Java and JVM as possible, mostly in order to reuse the ecosystem and libraries already available.
3 Analysis of the EO Concept and the Eolang Language Eolang is an object-oriented programming language aimed at realizing the pure concept of object-oriented programming, in which all components of a program are objects. Eolang’s main goal is to prove that fully object-oriented programming is possible not only in books and abstract examples but also in real program code aimed at solving practical problems. The EO concept departs from many of the constructs typical of classical objectoriented languages such as Java [1]: 1.
2.
3.
Static classes and methods are a popular approach to implementing utility classes in languages such as Java, C # and Ruby. In OOP methodology, this approach is considered bad practice, since they do not allow creating objects; therefore, they are not part of the OOP paradigm. Such classes are a legacy of the procedural programming paradigm. Following the principles of OOP, developers should provide the ability for objects to manipulate data when necessary, and the implementation of the logic for working with data should be hidden from external influences. Classes are templates and behavior of objects. The Elegant Object concept refuses to use classes in favor of types that define the behavior of objects. Each object inherits from its type only its methods, while objects of the same type can have different internal structures [6]. Implementation inheritance. EO does not allow inheriting the characteristics of objects, explaining that this approach turns objects into containers with data and procedures. The Eolang language developers consider inheritance to be a bad practice, as it comes from a procedural programming methodology for
358
4.
H. Saleh et al.
code reuse. Instead of inheriting implementation, the EO concept suggests creating subtypes that extend the capabilities of objects. Variability. In OOP, an object is immutable if its state cannot be modified after it has been created. An example of such an object in Java would be String. We can request the creation of new rows, but we cannot change the state of existing ones. Immutable objects have several advantages: • Immutable objects are easier to create, test and use. • Immutable objects can be used in several threads at the same time without the risk that some thread can change the object, which can break the logic of other threads. • The usage of immutable objects avoids side effects. • Immutable objects avoid the problem of changing identity. • Immutable objects prevent NULL references. • Immutable objects are easier to cache [7].
5.
6.
7. 8. 9. 10.
11.
12.
NULL. Using a NULL reference is against the concept of OOP since NULL is not an object. NULL is a NULL reference; a NULL pointer is 0 × 00000000 in ×86 architecture. NULL references complicate the program code, as they create the need to constantly check the input data for NULL. If the developer forgot to do this, there is always the risk of the application crashing with a NullPointerException. In EO, there are two approaches to creating an alternative to NULL—NULL Object—an object without any properties with neutral behavior, and throwing an exception if the object cannot be returned. Global variables and functions. In OOP methodology, objects must manipulate data, and their implementation must be hidden from outside influence. In Eolang, objects created in the global scope are assigned to the attributes of the system object, the highest level of abstraction. Reflection—the ability of the running application to manipulate the internal properties of the program itself. Typecasting allows you to work with the provided objects in different ways based on the class they belong to. Primitive data types. The EO concept does not imply primitive data types, since they are not objects, which is contrary to the OOP concept. Annotations. The main problem with annotations is that they force developers to implement the functionality of the object outside the object, which is contrary to the principle of encapsulation in OOP [8]. Unchecked exceptions. Unchecked exceptions hide the fact that the method might fail. The EO concept assumes that this fact must be clear and visible. When a method performs too many different functions, there are so many points at which an error can occur. The method should not throw exceptions in as many situations as possible. Such methods should be decomposed into many simpler methods, each of which can only throw 1 type of exception [8]. Operators. There are no operators like +, −, * and / in EO. Numeric objects have built-in functions that represent mathematical operations. The creator of EO considers operators to be “syntactic sugar.”
Eolang: Toward a New Java-Based Object-Oriented Programming …
13. 14.
359
Flow control operators (for, while, if, etc.). “Syntactic sugar.” The EO concept assumes the use of strict and precise syntax. Syntactic sugar can reduce the readability of your code and make it harder to understand.
4 Eolang Object-Oriented Programming Principles Abstraction In Eolang, objects are elements into which the subject area is decomposed. According to West [9, p. 24] objects, as abstractions of entities in the real world, represent a particularly dense and cohesive clustering of information. Inheritance Implementation inheritance does not exist in Eolang, as such inheritance constrains the structure and behavior of superclasses and can lead to subsequent development difficulties. Also, changes in the superclass can lead to unexpected results in the derived classes. In Eolang, inheritance is implemented through an object hierarchy and can be created using decorators. Objects in Eolang inherit only behavior, while they cannot override it, but only add new functionality. Polymorphism There are no explicitly defined types in Eolang, and the correspondence between objects is made and checked at compile time. Eolang always knows when objects are created or copied, as well as their structure. By having this information at compile time, you can guarantee a high level of compatibility among their users’ objects. Encapsulation The inability to make the encapsulation barrier explicit is the main reason why Eolang does not hide information about the object structure. All attributes of an object are visible to any other object. In Eolang, the main goal of encapsulation—reducing the interconnectedness between objects—is achieved in a different way. In Eolang, the density of the relationship between objects is controlled at the assembly stage. At compile time, the compiler collects information about the relationships between objects and calculates the depth for each relationship.
5 Comparative Analysis Between Eolang, Java, Groovy and Kotlin 5.1 Comparison of OOP Principles See Table 1.
360
H. Saleh et al.
Table 1 Comparison of OOP principles Principle
EO
Java
Groovy
Kotlin
Abstraction
Exists as the operation of declaring a new object by making a copy of another object
Exists as a class declared with the “abstract” keyword
Exists as a class declared with the “abstract” keyword
Like Java, abstract keyword is used to declare abstract classes in Kotlin
Encapsulation
Does not exist And will not be introduced. All attributes are publicly available to every object
Data/variables are hidden and can be accessed through getter and setter methods
In Groovy, everything is public. There is no idea of private fields or methods, unlike Java
Encapsulation exists in Kotlin, just like Java. The private, public and protected keywords are used to set view encapsulation
Inheritance
Does not exist And will not be introduced The usual inheritance is presented by decorators (@)
Uses and extends keyword to inherit properties of a parent class (superclass)
Uses and extends keyword to inherit properties of a parent class (superclass)
By default, Kotlin classes are final: They cannot be inherited To make a class inheritable, mark it with the open keyword
Polymorphism Does not exist Will be implemented (ad hocpolymorphism)
Java provides compile time and runtime polymorphisms
In Groovy, the type of the object is considered at runtime, not the type of the reference, so the method is found at runtime
Kotlin supports two forms of polymorphisms in Java
Data types
Provides both primitive and non-primitive data types
Groovy supports the same number of primitive types as Java
Kotlin’s basic data types include Java primitive data types
Presented as atom data type objects Atom is an acronym of “access to memory”
5.2 Comparison of Operators and Expressions The actions performed by operands in calculating expressions are done with the help of operators and some syntax expressions. Table 2 shows a comparison of operators and expression under the four languages.
Eolang: Toward a New Java-Based Object-Oriented Programming …
361
Table 2 Comparison of operations and expressions (Java, Groovy, Kotlin and Eolang) Scope resolution
Java
Groovy
Kotlin
Eolang
Parenthesis
()
()
()
()
Post-increment
++
++
++
Post-decrement
−−
−−
−−
Unary minus
−
−
−
Creating object
New
new
Multiplicative
*, /, %
*, /, %
*, /, %
.mul, .div, .mod
Additive
+, −
Equality
==, !=
==, !=, ===, !==
= =, !=, ===, !==
.eq, .not
Logical NOT
!
!
!
.not
Relational
=
< , , >=
=
.less
Logical AND
&&
&&
Logical OR
||
||
||
.or
Assignment
=
=
=
>
Access via object
., ?
.neg > .add, .sub
.and
Table 3 Comparison of declarative statements Java
Groovy
Kotlin
Eolang
Declaring a variable with initialization/creating an attribute String greeting = “Hello world”
String greeting = “Hello world”
Var name: String = “Hello world”
“Hello world” > greet
Declaring an object with initialization (creating an object) SomeObject name = new SomeObject()
SomeObject name = new SomeObject()
Var name = SomeObject()
SomeObject > name
5.3 Comparison of Declarative Statements Java, Groovy and Kotlin support both declarative and executable statements. The declarative statements are used to explicitly declare the data before it is used. Sample declarations can be found in Table 3. As it can be seen in the examples above, when creating an attribute in Eolang, the type is not stated you simply make a copy of an object or an attribute.
5.4 Comparison of Executable Statements See Table 4.
362
H. Saleh et al.
Table 4 Comparison of executable statements Java
Groovy
Kotlin
Eolang
if (){ ; }
if (){ ; }
if()
().If
if (){ ; } else { ; }
if (){ ; } else { ; }
if (){
} else {
}
(condition).If
while(){
}
while(){
}
().while [i]
Branching statements
Repetition statements while(){
}
Table 5 Comparison of data types
Java
Groovy Kotlin Eolang
Support for primitives Int
Yes
Yes
No (exists as Atom)
String
Yes
Yes
No (exists as Atom)
Float
Yes
Yes
No (exists as Atom)
Char
Yes
Yes
No (exists as Atom)
Boolean
Yes
Yes
No (exists as Atom)
5.5 Comparison of Data Types See Table 5.
6 Conclusion While in the development of complex software systems, universal languages that support the multi-paradigm style are often used, there are extensive classes of problems that are well suited to object-oriented (OO) programming languages with increased reliability. As part of the work, the ways of creating such a language are considered. It has been shown that several unreliable designs are often unnecessary. Some of the presented designs can be replaced with more reliable ones. Theoretically promising direction of development of languages with objects is a functional approach, in which the function acts as an object, and its arguments—as attributes and/or methods [10]. In this case, it is possible to provide some neutrality both in terms of typing and in terms of function parameters representing attributes
Eolang: Toward a New Java-Based Object-Oriented Programming …
363
and methods. Additional benefits are associated with parametric polymorphism, type inference and computational strategies—call-by-name, call-by-value, call-byneed, etc. To a certain extent, these mechanisms are implemented in the languages LISP, (S)ML, Miranda and Haskell. The multi-paradigm approach takes place in the programming language of the F#. It is possible to “immerse” fragments of the source code written in a wide range of languages into an abstract/virtual machine (such as CLR or JVM) that broadcasts components into intermediate code (MSIL or byte), respectively. As we have described above, existing OOP languages exhibit high complexities and many design flaws [11–14]. This paper has described the syntax and the semantics of the EO language, as well as the other distinctive features. It has assessed some qualitative characteristics and compared them to Java, Groovy and Kotlin. The ideas discussed above will help us prove that true OOP is practically possible. Not just in books and abstract examples, but in real code that works. We hope to make this Eolang language platform independent with more reliability, maintainability and quality. Positive results relate to compactness and clear style of language syntax and the LOC. This will be considered while Eolang is still in the process of improving compiler and establishing its concepts and principles (terminology and theory). Acknowledgements This work is a part of the R&D project #TC202012080007 (Java-based Object-Oriented Programming Language) supported by Huawei Technologies Co., Ltd.
References 1. Arnold, K., Gosling, J.: The Java Programming Language. Addison-Wesley, Reading, Mass (1996) 2. Abadi, M., Cardelli, L.: A Theory of Objects. Springer, New York (1996) 3. Scott, M.L.: Programming language pragmatics. Elsevier (2006) 4. Akhin, M., Belyaev, M.: Kotlin language specification. https://kotlinlang.org/spec/pdf/kotlinspec.pdf. Accessed 25 Jan 2021 5. [EO] cqfn/eo. CQFN (2021) 6. Bugayenko, Y.: ‘OOP Alternative to Utility Classes’. Yegor Bugayenko. https://www.yegor256. com/2014/05/05/oop-alternative-to-utility-classes.html. Accessed 25 Jan 2021 7. Bugayenko, Y.: ‘Objects Should Be Immutable’. Yegor Bugayenko. https://www.yegor256. com/2014/06/09/objects-should-be-immutable.html. Accessed 25 Jan 2021 8. Bugayenko, Y.: Elegant objects. Technical Report, vol. 1 (2017) 9. West, D.: Object Thinking, 1st ed. Pearson Education. (2004). https://www.ebooks.com/en-ao/ book/1676023/object-thinking/david-west/. Accessed 04 Feb 2021 10. Smaragdakis, Y., McNamara. B.: FC++: functional tools for object-oriented tasks. Softw. Pract. Exp. 32(10), 1015–1033 11. Chisnall, D.: Influential programming languages, Part 4. Lisp. (2011) 12. Jones, P., Simon, H.: 98 Language and Libraries: The Revised Report. Cambridge University Press (2003) 13. Auguston, M., Reinfields, J.: A visual miranda machine. In: Software Education Conference, 1994. Proceedings, pp. 198–203. IEEE (1994). https://doi.org/10.1109/SEDC.1994.475337 14. Syme, D.: F Sharp at Microsoft Research. https://www.microsoft.com/en-us/research/project/ f-at-microsoft-research
Mission-Critical Goals Impact onto Process Efficiency: Case of Aeroflot Group Alexander Gromoff, Sergey Zykov, and Yaroslav Gorchakov
Abstract There is a certain problem of goal-articulation correctness and strategic management of big and mission-critical socio-technical systems (STS). Therefore, the goal of the work is a prove of hypothesis that correct semantic strategic goal definition is the utmost influenceable success factor for progressing STS. The concept of “goal-setting vertical” is defined in the work. Semantic analysis of goal-setting vertical in the case of “Aeroflot Group” with statistical study of practical dependence of regulatory mechanisms is provided. On that basis, the number of management conclusions and recommendations for mission-critical companies is formulated. Keywords Strategic management · Socio-technical system (STS) · System analysis · Strategic goals · Mission · Business processes (BP) · Enterprise management · Planning · Balanced scorecard · SMART · Company goals
1 Problem Statement and Relevance of Study Modern rapidly changed information space increases the risks of robust workflow execution in any socio-technical system (STS) with its environment of cross-cutting processes. That determines the relevance of enhancing the quality of goal-setting and STS strategic management [1]. Organization’s strategic interests include the preservation of stable development, adaptation to the influence of external environment, hedging risks as well as forecasting their development in life cycle. Strategic management becomes a key factor of organization control, and its effectiveness depends on an adequacy of social value perception for STS activity. Due to classic cybernetic, any STS is a complex open system and as a part of the bigger external system progresses due to inheriting mission and goal preordination from that external system. The combination of mission and goals defines the system’s
A. Gromoff (B) · S. Zykov · Y. Gorchakov Graduate School of Business, National Research University “Higher School of Economics”, Moscow, Russia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_31
365
366
A. Gromoff et al.
strategic priorities, which designate the implementation of mission-critical business process architecture (BPA) and infrastructure of STS. As well “a track” of STS that reflects the progress of organization in its life cycle is depended on complex factors at external level (politics, economics, society, ecology, technology) and internal level (stakeholders, suppliers, legislative and state bodies, consumers, competitors). The regular STS progress path is determined by the set of operational targets and is valid in a certain period. If at the particular point of time, the regular STS path corresponds to the track with certain permissible deviation, and then the achievement of its goals is stable and robust, and therefore, management quality and trustworthiness criterion are achieved. Otherwise, unstable and uneven BP stages occurs that leads to decrease of operating risk management effectiveness and key performance indicators (KPI) disbalance. Foresaid is a precursor condition for violation of quality and reliability of STS, which discharges the property of the system to an adequate response to internal and external changes. The system loses the ability to adequately respond, therefore, effectively manage itself and, accordingly, misses the progress path with an unpredictable result. Therefore as citing P. F. Drucker (guru of BPM in XX c.) “Wrong articulation of the mission and goals result in time and energy-wasting on carefully implemented actions which need not at all” should be borne in mind its actuality in nowadays of chaotic entropy increase. In principle, the strategic goals have to reflect the following nodes [2]: 1. 2. 3. 4. 5.
System of social value’s requirements and mission-critical environmental factors; STS mission, consistent with external values, and mission-critical environmental factors; Strategic vision in STS mission perspective; Strategic goals tree balanced in BSC with KPIs; Cross-cutting processes models consisted with strategic goals of STS.
Therefore, any STS from the view of “goal-setting vertical” can be defined as [3]: Goal-oriented communicating system accomplished by harmonizing of the stable states of goal-setting vertical’s nodes, each of which unambiguously determines the next. Mutual deviations arising in the nodes of goal-setting vertical, both in the direction of negative result or positive one, can be explained by the tendency to lower internal structure quality in STS architecture. Such as numerous positive feedbacks from operational to strategic levels of architecture that contribute to further deviation of interoperability level and therefore to STS destruction. The semantic and multifactorial analysis of the strategic goals in large STS was conducted in the work as well as the dependence of STS’s BP architecture from the correctness of strategic goals articulation in case of aviation holding – “Aeroflot Group” was considered.
Mission-Critical Goals Impact onto Process Efficiency …
367
2 Sematic Analysis of Strategic Goals The provided study is based on the univocal level analysis of semantic interoperability under scope of articulation correctness of the strategic goal nods and includes the following steps [4]: • • • •
Determining the system of mission-critical factors (MCF) for Aeroflot Group Semantic analysis of the Aeroflot Group mission Semantic analysis of the Aeroflot Group strategic goals Semantic analysis of the Aeroflot Group BP of value-added-chain in VAD
Semantic analysis in the frame of goal-setting vertical of the Aeroflot Group strategic goals allows to conduct a high-quality diagnostic of the management system of the Aeroflot Group and to formulate recommendations regarding the improvement of BPA of STS and improvement of particular functional subsystems.
2.1 Determining the System of Mission-Critical Factors From the system analysis view, the Aeroflot Group is a subsystem in numerous sets of large STSs that form MCF of the big system. Exactly this fact explains the inability to a priori determine the system of MCFs that directly or indirectly influence the group. However, consider the Aeroflot Group as an element of the civil aviation industry with similar characters of Big STS influence on all airlines in the industry and as an airline inherits acceptable mission from the social value requirement; therefore, the set of civil aviators’ missions will aggregate projection of the MCF for the whole industry that gives the opportunity to “a posteriori” formulate the set of missioncritical factors for the industry. Statistical semantic research on the sample of aviation companies’ missions that are included in the “Skytrax” rating was provided [5–7]. The missions were decomposed in the sets of lexemes and cleared form the semantic noise; for each lexeme, frequency rank was determined. Following the Zipf’s law, the graph was plotted. The graph depicts the dependence of lexeme use frequency and its rank in Fig. 1, where 1. 2. 3.
The area of unique lexemes that determines the set of semantically mostly significant lexemes for the civil aviation industry; The area of key lexemes that determines the set of semantically relatively significant lexemes for the civil aviation industry; The area of neutral lexemes that determines the set of semantically neutral lexemes for the civil aviation industry.
Thus, the constructed semantic core includes the set of lexemes that consists of unique and key lexemes for the civil aviation industry. Further analysis included the classification of the semantic core lexemes by the semantic proximity degree.
368
A. Gromoff et al.
Fig. 1 Lexeme frequency use dependence ranked in Zipf’s law
Fig. 2 Semantic model of system MCFs in civil aviation industry
Then the distribution of the relative frequency in the semantic classes was obtained and estimated the size of the semantic classes. Resulted from the semantic model of system, MCFs for the civil aviation industry are on Fig. 2.
2.2 Sematic Analysis of Mission in Aeroflot Group Case The mission of the Aeroflot Group is determined in the following way: “We work to ensure that our customers can quickly and comfortably travel great distances and thus be mobile, meet more often, work successfully, and see the world in all its diversity.
369
Customer Service
Logistics
Society and the Environment
Internal Environment
External Customers
Groth and Development
Regulatory Controls
Internal Customers
Security
Finance
Price policy
0.6 0.5 0.4 0.3 0.2 0.1 0 Innovation
RELATIVE WEIGHT
Mission-Critical Goals Impact onto Process Efficiency …
SYSTEM OF MISSION CRITICAL FACTORS
Fig. 3 Semantic model of the Aeroflot Group mission
We give our customers a choice through an extensive route network and different carriers operating within our group, from low-cost to premium class airlines.” Semantic analysis of the Aeroflot Group mission gave semantic model (Fig. 3) relatively to the system MCFs for the industry meaningful relative weights.
2.3 Semantic Analysis of Strategic Vision in of Aeroflot Group Case Strategic goal of the Aeroflot group is declared in the following way: Is to strengthen leadership in the global airline industry by seizing opportunities in the Russian and international air transportation markets.
The semantic analysis of the Aeroflot Group strategic vision was provided in the same way as it was done for mission and presented on Fig. 4. Comparing plots from Figs. 2, 3, and 4, it is evident that disruption of the Aeroflot Group mission and goals with corresponding industrial parameters is essential, so next was provided semantic analysis with SMART approach.
2.4 Semantic Analysis of Aeroflot Group Strategic Goals Key directions of Aeroflot Group strategy identified until 2023 as follows [8]: 1. 2. 3.
To carry 90–100 million passengers in 2023; International transit traffic of 10–15 million passengers in 2023; To launch an international hub in Krasnoyarsk and three new regional bases in Sochi, Yekaterinburg, and Novosibirsk;
A. Gromoff et al.
Customer Service
Logistics
Society and the Environment
Internal Environment
External Customers
Groth and Development
Regulatory Controls
Internal Customers
Security
Finance
Price policy
0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0
Innovation
RELATIVE WEIGHT
370
SYSTEM OF MISSION CRITICAL FACTORS
Fig. 4 Semantic model of the Aeroflot Group strategic vision
4. 5.
To bring 200 Russian-built Superjet 100 and MC 21 aircraft online by 2026; To deliver a leading-edge level of digitalization across Aeroflot Group.
The degree of semantic correctness of the Aeroflot Group’ strategic goals was studied in the frame of SMART, PURE, CLEAR approaches or «clear goals». Sematic formulations of the strategic goals 1, 2, and 3 have a high level of correctness under the SMART goal-setting model. Strategic goal 4 has a middle level of correctness as the deadline for reaching the goal is outside of the accepted strategy timeline. Goal 5 has a low level of correctness since: • Specific—not clear what is level is “leading digitalization level” and who determines that level; • Measurable—how to measure the level of digitalization is lacking; • Achievable—the possibility to determine feasibility is lacking; • Relevant—the possibility to estimate the relevance of the goal is lacking; • Time-bound—in the goal-setting, there is no specified period Sematic formulations of the strategic goals 1, 2, 3, and 4 have a high level of correctness under the PURE goal-setting model. Goal 5 has a low level of correctness as: • Understandable and Relevant—similar as in SMART; Sematic formulations of the strategic goals 1, 2, and 3 have a high level of correctness under the CLEAN goal-setting model. As goal 4 has a low level of correctness since: • Environmental—in the document “Condition of the safety flight 2018” it is stated that the level of safety of all flight of the Aeroflot Group is “high,” except for the level of safety of Superjet-100, which is “middle” and not “high,” meaning that in the goal-setting there is a statement of the potential danger for customers. The goal 5 has a low level of correctness since: Appropriate—this goal is not appropriate because there is no relevance to other strategic goals.
Mission-Critical Goals Impact onto Process Efficiency …
371
Table 1 Degree of correctness of semantic formulation of the Aeroflot Group strategic goals Strategic goals
The degree of semantic correctness By SMART
By PURE
By CLEAR
High
High
High
High
High
High
3
High
High
High
4
Average
High
Low
5
Low
Low
Low
Customer Service
Logistics
Society and the Environment
Internal Environment
External Customers
Groth and Development
Regulatory Controls
Internal Customers
Security
Finance
Price policy
0.60 0.50 0.40 0.30 0.20 0.10 0.00 Innovation
RELATIVE WEIGHT
1 2
SYSTEM OF MISSION CRITICAL FACTORS All strategic goals
Semantically correct strategic goals (1,2,3)
Fig. 5 Semantic model of the Aeroflot Group strategic goals
The main result of the research of the level of correctness of semantic formulations of the Aeroflot Group strategic goals is that strategic goals 4 and 5 are not semantically correct, meaning that they should be excluded from the further analysis (Table 1). The semantic model of the strategic goals of MCFs relatively civil aviation industry for the Aeroflot Group (Fig. 5) was done due to the results of the SMART approach.
2.5 Semantic Analysis of Aeroflot Group Business Processes The main business processes in VAD of Aeroflot Group are defined as in Fig. 6 [9]. Corresponding semantic model from VAD (Fig. 7) relative to MCFs of the civil aviation industry was done in the same algorithm as for the previous cases.
372
A. Gromoff et al. Core business processes of the value chain of the “Aeroflot”
Management business processes
Qality management
Safety management
Financial planning and analysis
Case management
Risk management
Quality management system management
Product quality management Project management Procurement Management Marke ng management Corporate finance management
Management of compu ng infrastructure and telecommunica ons
Management of construc on and opera on of buildings and structures
Support business processes
Ground transporta on support Planning and coordina on of opera ons
Base airport coordina on Quality control of supply of spare parts for aircra and aircra
Develompent business processes
Core business processes
Staff development
Airworthiness
Aircra fleet planning and development
Aircra maintenance
Internal audit
Mode control and special communica ons Occupa onal safety and health Accoun ng
Flight opera ons
Avia on training Applica on opera on Freight transporta on Sales
“Aeroflot-bonus” program office Legal support
Onboard service
Corporate governance Financial management Network and revenue management Human resources management Property management Avia on security management
Economic security Branch and representa ve office management
Environmental protec on
Corporate Strategy Management Informa on systems management Public rela ons management External rela ons and alliances management
Fig. 6 Core business processes of the value chain of the Aeroflot Group
Fig. 7 Semantic model of the Aeroflot Group MCFs relatively industry factors
2.6 Short Conclusion The consolidated semantic model of the strategic goals of the Aeroflot Group (Fig. 8) was constructed and analyzed. To conclude—the level of mutual match of the factor’s
373
Customer Service
Logistics
Society and the Environment
Internal Environment
External Customers
Groth and Development
Regulatory Controls
Internal Customers
Security
Finance
Price policy
0.6 0.5 0.4 0.3 0.2 0.1 0 Innovation
RELATIVE WEIGHT
Mission-Critical Goals Impact onto Process Efficiency …
SYSTEM OF MISSION CRITICAL FACTORS Mission critical factors
Mission critical
Strategic vision
Strategic goals
Semantically correct strategic goals
Core business processes
Fig. 8 Consolidated semantic model of the Aeroflot Group strategic goals
nodes is critically low that increases probability of low-quality efficiency of the group performance, operational risk growth, and air accidents. These air accidents are connected to the functioning of the inner business processes. The functioning of such processes is at risk due to the lack of conceptual and technological connection between strategic goal-setting and organizational decision.
3 Statistical Part of Research The model of the “swiss cheese” developed by James Reason shows that plane crashes involve the joint influence of the consistent distortions in the work of the complex STS. These distortions are caused by a set of factors. “As the Reason model is based on the idea that such complex STS as the airline has multi-level protection, inner single divergences rarely lead to serious consequences. Violations in the system of safety defense are the slow consequences of the decisions that are made on a higher level of the system. The destructive potential of these decisions becomes evident only when the specific events take place. Under such concrete circumstances, people’s mistakes or active refusal on the exploitation level trigger the hidden conditions that facilitate the violation of the means of safety provision of the flights” [4]. Foresaid serves as direct evidence of low-quality level in goal-setting. One of the tragic consequences of such low-quality goal-setting for the Aeroflot Group was the plane crash of Sukhoi Superjet 100 in Sheremetyevo on May 5, 2019. This tragedy resulted in the death of 41 people out of 78 people that were on the board. Possible reasons for this tragedy according to the international aviation committee are technical malfunction, low qualification of the crew, and bad weather conditions. All these problems are essentially connected to the functioning of the business processes
374
A. Gromoff et al.
group that provide safety of Aeroflot Group (processes of checking the technical conditions, processes of training and skills improvement of the crew, and so on). Is it possible to prevent such tragedies by engaging in high-quality reengineering of the strategic goals of the group on the strategic management level and engaging in building the architecture designed for providing safety which is the basic human need? To answer this question, the benchmarking of Aeroflot Group architecture with different airlines regarding the concept of “black box” is to be considered. The “black box” input is the company mission, and output is KPI of BP functioning that shows the dependence between the mission semantics as the key node in the strategic goals and KPIs of BP functioning that serves as depiction of the results of group’s activities. The particular hypothesis follows declaring the term “safety” in the airline mission as the key MCF forms and controls business processes for the safety provision and, as a result, increases the actual level of passengers’ safety. This happens due to the harmonization of the steady condition of the strategic goal nodes. To support/reject the particular hypothesis, the research on the rating of the safest airlines in the world was conducted. This rating was proposed by the German company Jacdec in 2019. The rating consists of information on 100 airlines sorted by safety index that is based on the information on the accidents that have already taken place and current operational information on the company. This rating is computed based on the 33 safety criteria. The main criteria are • number of air accidents during the year; • environment factors: airport landscape and its infrastructure peculiarities, weather conditions of the main destinations and others; • airline performance indicators such as the age of the aircraft, destinations map, (IATA Operational Safety Audit), and International Civil Aviation Organization(ICAO). From that rating, 63 Russian and international companies were chosen to statistically estimate the influence level of the semantic content of the airline missions on the level of actual passengers’ safety. For higher quality analysis, the data was classified on five classes based on geographical location: North America, Latin America, Europe, East Asia, and West Asia. For the airlines under consideration, three classes were determined: middle safety index of Jacdec airlines, middle safety index of Jacdec airlines that mentioned in the mission the MCF “safety,” and middle safety index of Jacdec airlines that did not mention in the mission the MCF “safety” (see Fig. 9). Evidently that the particular hypothesis is statistically proven as in all regions under consideration the average level of safety for the Jacdec airline that mentions “safety” as the MCF is higher compared to the Jacdec airline that does not mention “safety” as the MCF. Generalizing the obtained results, proper semantic fulfillment of mission and accounting all MCFs in civil aviation industry allow to build sustainable high-quality business process management system for main and supporting activities.
Mission-Critical Goals Impact onto Process Efficiency …
375
JACDEC SECURITY INDEX
92.0% 90.0% 88.0% 86.0% 84.0% 82.0% 80.0% 78.0% North America
Latin America
Europe
East Asia
West Asia
REGION Average Jacdec airlines security index for airlines in a region Average Jacdec airlines security index for airlines in a region whose mission has a critical safety factor Average Jacdec airlines security index for airlines in a region whose mission has not a critical safety factor Jacdec airlines security index of the “Aeroflot”
Fig. 9 Dependence of average Jacdec airline safety index and MCF in airline missions
4 Conclusion The semantic-statistical analysis’ results of process management effectiveness dependence from strategic goal correct articulation lead to the following conclusions: Complex of the MCFs dedicated from the external environments determines the compulsory system of strategic priorities that are directed at realization of vital business architectures, structures, and infrastructures; Strategic goals’ vertical is the key element that allows to incorporate a highly quality decomposition of MCFs in the results of main and supporting activities of Business; Nods of strategic goals’ vertical must be fully mutually complied; Strategic goals must be oriented toward the inner decomposition of the key values of business and not toward marketing or brand determinants; For each MCF to develop supporting business architecture, structures and infrastructures, as well as proactive behavioral models and regulations of actions in case of emergency; Deep understating of the extreme importance and essence of strategic management is a key factor for building a high-quality business process system for the main and supporting activities of any STS. Concerning accuracy of the proposed approach, it is necessary to comment that any accuracy appears in statistical experience as with research in virus efficiency, which is now actual, here we propose approach to be considered for further application by numerous followers. The authors of this article regularly analyze such solutions, linking the processes efficiency with their semantic value, and this is one of our key methods. Moreover, we believe that this method is critical for deeper and qualitative analysis of processes;
376
A. Gromoff et al.
otherwise, it becomes incomplete and not system-based. This conclusion is confirmed through the years of experience in implementing projects in various fields of activity and conceptual modeling of big systems.
References 1. Borgardt, E.A.: Strategic management of sustainable development of an enterprise, No. 1. Actual Probl. Econ. Law 55–61 (2013) 2. Gromoff, A.I.: Business Process Management: Modern Methods, 368 p. Yurayt Publishing, Moscow (2017) 3. Gromoff, A.I.: Management in view of digital transformation. In: Bergener, K., Räckers, M., Stein, A. (eds.) The art of structuring, pp. 385–396. Springer, Cham (2019) 4. Zykov, S., Gromoff A., Kazantsev, N.: Software Engineering for Enterprise System Agility: Emerging Research and Opportunities. IGI Global, Hershey (2019) 5. Corporate Philosophy. AEROFLOT. Russian Airlines. https://www.aeroflot.ru/ru-ru/about/aer oflot_today/ourbrand. Accessed 23 Mar 2020 6. Flight Safety Management Manual (SMM), 3rd edn., 300 p. ICAO, Montreal (2013). ISBN 978-92-9249-334-9. 7. International Civil Aviation Organization Safety Management Manual (SMM), 3rd edn. 2013 Strategy. AEROFLOT. Russian Airlines. https://ir.aeroflot.ru/ru/company-overview/strategy/. Accessed 23 Mar 2020 8. The Organizational Structure of PJSC “Aeroflot”. AEROFLOT. Russian Airlines. https://www. aeroflot.ru/ru-ru/about/aeroflot_today/structure_of_company/structure. Accessed 23 Mar 2020
High-Dimensional Data Analysis, Knowledge Processing and Applications
A Classification Method Based on Ensemble Learning of Deep Learning and Multidimensional Scaling Kazuya Miyazawa
and Mika Sato-Ilic
Abstract In this paper, a classification method based on an ensemble learning of deep learning and multidimensional scaling is proposed for a problem of discrimination of large and complex data. The advantage of the proposed method is improving the accuracy of results of the discrimination by removing the latent structure of data which have low explanatory power as noise, and this is done by transforming original data into a space spanned by dimensions which explain the latent structure of the data. Using numerical examples, we demonstrate the effectiveness of the proposed method. Keywords Multidimensional scaling · Deep learning · Ensemble learning
1 Introduction Recently, deep learning has been attracting attention due to its high accuracy in the classification problems. Recent development of information technology has made it possible for us to obtain large and complex data. Among them, data such as images and sensors are widely used in the field. One of the processes using image and sensor data is classification. As a method of classification of image and sensor data by using deep learning, a method called convolutional neural network (CNN) has been attracting attention because of its high accuracy [1]. However, there are data in which it is difficult to improve the accuracy of the results by using deep learning methods, such as CNN. For example, in the product classification of fashion images, some labels may not be accurate, even with a tuned CNN model [2]. In such cases, it is considerable using not only deep learning, but
K. Miyazawa Graduate School of Systems and Information Engineering, University of Tsukuba, Tennodai 1-1-1, Tsukuba 305-8573, Ibaraki, Japan M. Sato-Ilic (B) Faculty of Engineering, Information and Systems, University of Tsukuba, Tennodai 1-1-1, Tsukuba 305-8573, Ibaraki, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_32
379
380
K. Miyazawa and M. Sato-Ilic
also combining other methods with deep learning to improve the accuracy of the result. In this paper, as one of the combined methods, we propose a classification method based on an ensemble learning of deep learning and multidimensional scaling by using the multidimensional scaling [3]. Multidimensional scaling is a method for obtaining a set of data in a lower-dimensional space than the dimension where the data exists, based on similarity (or dissimilarity) among objects. Since obtained dimensions spanned the low-dimensional space have high explanatory power for the original data structure of the high-dimensional space, the data structure explained by dimensions excluding the dimensions spanned the low-dimensional space are considered to be noise which do not explain the original data structure. Therefore, we propose a classification method to obtain more accurate results than the cases when using only the deep learning methods by removing data treated as noise using multidimensional scaling and then applying a classifier learning method, including a deep learning method. In addition, when dissimilarity (or similarity) data is not given directly, but data consisting of objects and variables is given, dissimilarity (or similarity) between objects is defined mathematically for adequately explaining dissimilarity (or similarity) relationship among objects in order to apply a multidimensional scaling method to this type of data. In this paper, by using the idea of dissimilarity considering the grade of attraction [4], we use dissimilarity to obtain more accurate results that consider not only dissimilarity between objects but also the weight of each object. The structure of this paper is as follows. First, in Sect. 2, we describe neural networks and convolutional neural networks (CNN). In Sect. 3, we describe multidimensional scaling. In Sect. 4, we propose a classification method based on an ensemble learning of deep learning and multidimensional scaling. In Sect. 5, we show the effectiveness of the proposed method using images and sensor data. Finally, in Sect. 6, we mention some conclusions.
2 Neural Network The model of neural network [5, 6] is composed of an input layer, hidden layer, and output layer, which is consisted of some nodes and nodes in each layer are connected by edges, as shown in Fig. 1. It is also assumed that each edge has a weight. Let the value of the ith node in the input layer be xi , the value of the next certain hidden layer node be a, the weight corresponding to the ith node in the input layer be wi , and the number of nodes in the input layer be I . We can express this as a= f
I i=1
wi xi + b ,
A Classification Method Based on Ensemble Learning …
381
Fig. 1 Structure of neural network
where b is bias and f (x) is an activation function. The activation function is a function that determines how the sum of the inputs is activated. Examples of the activation function are sigmoidal function f (x) =
1 , 1 + exp(−x)
and ReLU function f (x) = max(0, x). The representation of the neural network varies with weight w. Determining these weights based on the training data is called learning. A gradient descent method is a typical example of a learning method. A gradient descent method is a typical optimization method for learning that uses differentiation for the loss function and obtains an optimal solution by repeatedly shifting the weights for the gradient direction. A loss function is a function that expresses how much a gap exists between the current neural network and the correct data. The form of the loss function varies depending on various problems, but cross-entropy error L = −k tk log yk is often used in classification problems. Where k represents a number of a class and tk shows the status of the correct label in which tk = 1 shows the status of the correct label and tk = 0 means an incorrect label. yk represents the kth value of the output layer [y1 . . . yn ] of the neural network of the n-class classification. Note that yk can be written as ⎛ ⎞ J yk = f ⎝ w jk a j + bk ⎠, j=1
382
K. Miyazawa and M. Sato-Ilic
where a value of the jth node of the hidden layer is a j , the weight corresponding to the jth node of the hidden layer is w jk , bias is bk , and the number of nodes of the hidden layer is J . The gradient descent method updates some weight w in the neural network with the following equation w =w−η
∂L , ∂w
where η is a parameter that determines how much weight is to update, called a learning rate. One of the problems with normal neural networks is that the form of data must match to the adaptable form for the normal neural networks. For example, in the case of image data, the data is originally formed as two-dimensional data, or if color is included, then it is three-dimensional data. However, for applying such data to the normal neural networks, this form is needed to convert to one-dimensional data. Figure 2 shows the situation of this problem. In this figure, 0 and 2 are adjacent to each other in the image, but if the data is converted into one dimensional, the spatial information of the adjacent data is lost. Therefore, convolutional neural networks (CNN) have been proposed to maintain such spatial information. There are two distinctive layers in CNN: a convolutional layer and a pooling layer. Other than that, it is structurally similar to normal neural networks.
Fig. 2 Transformation of image data
Fig. 3 Example of convolutional layer
A Classification Method Based on Ensemble Learning …
383
Fig. 4 Example of pooling layer
In the convolutional layer, a process called convolutional operation is performed. An example of the convolution layer is shown in Fig. 3. In the convolution layer, a filter is applied to each channel of an input image by sliding it at regular intervals. The channel represents the color information of each pixel in the image data. The number of channels is 1 for a grayscale image and 3 for a color image consisting of RGB. At each location, the corresponding element of the input data is multiplied by the corresponding element of the filter, and the sum of the two is stored in the corresponding element of the output. This process is performed in all places to obtain the output of the convolutional operation. Figure 3 shows that Filter is applied to the bold frame of Input to perform the convolutional operation 1 × 1 + 0 × 0 + 0 × 1 + 1 × 0 + 1 × 1 + 0 × 0 + 1 × 1 + 1 × 0 + 1 × 1 = 4, and the result is stored in the corresponding place in the output. In CNN, we consider each filter value as a weight and update it. Such a convolutional operation may allow feature extraction while maintaining the form of the image. The pooling is an operation to reduce the space in the vertical and horizontal directions and is used to aggregate the values in a specified region into a single value. There are several methods of processing, such as taking the maximum value (max pooling) and taking the average value (average pooling). An example is shown in Fig. 4. In this example, a 2 × 2 max pooling is used with a stride of 2. Here, 2 × 2 denotes the region to be pooled, and stride denotes the region’s movement interval. In Fig. 4, the max pooling operation is performed with 3 as the output, which is the maximum value among the values of 0, 3, 1, and 0 in the bold frame of the input, and the result is stored in the corresponding place of the output. The feature of the pooling layer is that it is robust to small differences of the input data. Therefore, it is effective in alleviating the effects of the noise in the image. Moreover, it has the effect of compressing the image and preventing the increase of the number of parameters.
3 Multidimensional Scaling (MDS) Assume that n objects are in L-dimensional Euclidean space. Their coordinates are represented by an n × L matrix X. If column vectors of X are independent with each other, then an n × n matrix D(2) whose elements are squares of the Euclidean
384
K. Miyazawa and M. Sato-Ilic
distances among n individuals can be represented as follows:
D(2) = 1n 1n diag X X − 2X X + diag X X 1n 1n ,
(1)
where 1n is an n-dimensional vector whose elements are all 1, and diag X X is a diagonal matrix whose diagonal elements are the diagonal elements of X X . We apply the Young-Householder transformation to D(2) in (1) as follows: P = − 21 J n D(2) J n ,
(2)
where J n is
J n = I n − 1n 1n /n. Let X ∗ ≡ J n X, (2) is
P = − 21 J n 1n 1n diag X X − 2X X + diag X X 1n 1n J n = X ∗ X ∗ .
(3)
X ∗ represents a matrix of coordinates whose origin is the centroid of the values of the coordinates. Equation (3) shows that the matrix of squared Euclidean distances, D(2) , is transformed to a product of an n × L coordinate matrix X ∗ whose origin is the centroid and its transpose matrix by the Young-Householder transformation. By using this matter, we apply the Young-Householder transformation to the observed dissimilarity between objects i and j, oi j and obtain 1 Pˆ = − J n O (2) J n . 2 The estimate of X which minimizes 2
2 φ 2 (X) = tr Pˆ − P = tr Pˆ − X X , ˆ and this satisfies the following condition: is shown as X,
Pˆ − Xˆ Xˆ Xˆ = 0.
Let eigenvalue and eigenvector decomposition of Pˆ be as follows:
ˆ Sˆ , Pˆ = Sˆ and let Sˆ A be a matrix of eigenvectors corresponding to A eigenvalues from the larger one, then Xˆ can be obtained as follows [3]:
A Classification Method Based on Ensemble Learning …
385
1
ˆ A2 . Xˆ = Sˆ A
(4)
Notice that at least A eigenvalues of Pˆ must be positive.
4 A Classification Method Based on Ensemble Learning of Deep Learning and Multidimensional Scaling By using the MDS for removing the latent structure of data which have low explanatory power as noise for CNN, we propose a classification method. In this method, we first use CNN to obtain a classification result. Then, based on the classification result, it obtains values of the coordinate in a low-dimensional space. And, finally, by using the values of the coordinate, re-predicts the final classification. Firstly, we classify a dataset into train C N N to train a deep learning model, train M DS for training re-prediction using the obtained result of the multidimensional scaling, and test for evaluation. The algorithm of the proposed method is as follows: Step 1: Step 2: Step 3: Step 4: Step 5: Step 6: Step 7:
Construct a CNN model model C N N for train C N N . Step 2: Obtain the prediction labels for train M DS , test from model C N N . Repeat the following Step 4 to Step 7 for each label i. Create a dataset Di with only data whose prediction label is i at Step 2. Apply the multidimensional scaling to Di in order to obtain values of the coordinate in a low-dimensional space. Construct a classification model model M DS by using only data belonging to train M DS in Di . Obtain prediction labels of test from model M DS as the final prediction result.
Since a given data consists of objects and variables, to apply the multidimensional scaling at Step 5 of the above algorithm, the dissimilarity between objects has to be measured by using an appropriate dissimilarity. In this study, based on an idea of dissimilarity considering grade of attraction [4], we define new dissimilarities considering weights of each object. The dissimilarity between objects i and j, d˜i j , is defined as follows: di j , i, j = 1, · · · , n, d˜i j ≡ qi q j
(5)
where qi shows a weight of an object i and di j is the Euclidean distance between objects i and j. From (5), it is shown that when the Euclidean distance between objects i and j is smaller and both qi and q j are larger, then d˜i j is smaller. Next, the dissimilarity between objects i and j, dˆi j , is defined as follows:
386
K. Miyazawa and M. Sato-Ilic
di j , i, j = 1, · · · , n. dˆi j ≡ qi
(6)
From (6), it is shown that when the Euclidean distance between objects i and j is smaller and qi is larger, then dˆi j is smaller. Since this dissimilarity is generally dˆi j = dˆ ji , it is not applicable to the multidimensional scaling whose solution is estimated by (4). Therefore, we use a symmetry part D ∗ as follows: D∗ =
D + Dt = di∗j , i, j = 1, · · · , n, 2
(7)
where D is a matrix whose an (i, j) element is dˆi j shown in (6).
5 Numerical Example The data is measured by sensors for movements of 12 physical activities by 10 subjects. [7–9] The jogging and running measurements of Subject 1 and Subject 2 are extracted from this data and combined. The number of objects of this data is 12,288 in which the number of objects included in label 0 is 6144 and the number of objects included in label 1 is 6144. The objects indicate sensor observations. The labels are 0 indicates jogging and 1 indicates running. The number of variables is 23, and the variables are acceleration from the chest sensor (X-axis), acceleration from the chest sensor (Y-axis), acceleration from the chest sensor (Z-axis), electrocardiogram signal (lead 1), electrocardiogram signal (lead 2), acceleration from the left-ankle sensor (X-axis), acceleration from the left-ankle sensor (Y-axis), acceleration from the left-ankle sensor (Z-axis), gyro from the left-ankle sensor (X-axis), gyro from the left-ankle sensor (Y-axis), gyro from the left-ankle sensor (Z-axis), magnetometer from the left-ankle sensor (X-axis), magnetometer from the left-ankle sensor (Yaxis), magnetometer from the left-ankle sensor (Z-axis), acceleration from the rightlower-arm sensor (X-axis), acceleration from the right-lower-arm sensor (Y-axis), acceleration from the right-lower-arm sensor (Z-axis), gyro from the right-lowerarm sensor (X-axis), gyro from the right-lower-arm sensor (Y-axis), gyro from the right-lower-arm sensor (Z-axis), magnetometer from the right-lower-arm sensor (Xaxis), magnetometer from the right-lower-arm sensor (Y-axis), magnetometer from the right-lower-arm sensor (Z-axis). We consider a classification problem in which the classification label was determined based on whether the subject did jogging or running. Label 0 shows jogging, and label 1 is running. The dataset was divided into 6144, 3072, and 3072, for train C N N for deep learning training, train M DS for learning re-prediction using a result of multidimensional scaling, and test for evaluation, respectively. For this data, we first created a CNN model by using the train C N N as a training data. The CNN model was implemented by using TensorFlow [10] which is a deep learning library. The structure of this model is shown in Fig. 5 [11].
A Classification Method Based on Ensemble Learning …
387
Fig. 5 Structure of CNN model
Table 1 CNN result for each label for sensor data
Label
Accuracy
0
0.913
1
0.846
Table 1 shows a result of the CNN classification for each label. In this table, accuracy shows a ratio between the number of correctly classified objects for a label and the total number of the objects in the label. Next, multidimensional scaling was applied to the data classified into each label in the train M DS and test to obtain values of a coordinate in a low-dimensional space. The dimension of the low-dimensional space was set to 15 dimensions. This means that a data structure described by 8 ( = 23−15) dimensions was removed as noise in label 0, as well as a data structure described by 8 ( = 23−15) dimensions was removed as noise in label 1. The dissimilarity between objects was defined as the Euclidean distance and dissimilarities shown in (5) and (7), respectively. In (5) and (7), two types of weights, qi1 and qi2 , were given for each object. qi1 is a mean of values of the x-directional acceleration of a left-ankle sensor for each subject during walking, and qi2 is an output value of the model C N N . The dissimilarities shown in (5) and (7) by using qi1 are represented as d˜i1j , di∗1j , respectively. And the dissimilarities shown in (5) and (7) by using qi2 are represented as d˜i2j , di∗2j , respectively. Each dissimilarity was used to apply the multidimensional scaling in Step 5 of the algorithm described
388
K. Miyazawa and M. Sato-Ilic
Table 2 Results of the proposed classification method for each label by using sensor data Label
Dissimilarity
Accuracy
0
0.919
0
d ij d˜ 1
ij
0.958
0
di∗1 j
0.928
0
d˜i2j
0.963
0
di∗2 j
0.924
1
0.846
1
d ij d˜ 1
ij
0.843
1
di∗1 j
0.844
1
d˜i2j
0.889
1
di∗2 j
0.889
in Sect. 4. Table 2 shows the percentages of correct solutions of the results of the proposed classification method described in Sect. 4 by using each dissimilarity. From the results shown in Tables 1 and 2, it can be shown that the results of the proposed method were generally more accurate than the cases using only the CNN. In addition, from Table 2, it can be shown that the accuracy of the results using the proposed dissimilarities tend to be better when compared with the cases using the Euclidean distance. Next, we used data from Fashion-MNIST [12]. This data is labeled image data with ten categories of clothing. The data type is 28 × 28, and the color is grayscale. We used 5000 data for train C N N , 10,000 data for train M DS , and 10,000 data for test which are descriptions of datasets in the proposed algorithm described in Sect. 4. For this data, we first created a CNN model by using the train C N N as a training data. The CNN model was implemented by using TensorFlow [10], which is a deep learning library. The structure of this model is shown in Fig. 5 [11]. Next, multidimensional scaling was applied to the data classified into each label in train M DS and test to obtain values of a coordinate in a low-dimensional space. The dimension of the low-dimensional space was 100. The dissimilarity between objects was defined as the Euclidean distance and dissimilarities shown in (5) and (7), respectively. In (5) and (7), two types of weights, qi1 and qi2 , were given for each object. qi1 is a mean of values of each pixel in each image, and qi2 is an output value of the model CNN . The dissimilarities shown in (5) and (7) by using qi1 are represented as d˜i1j , di∗1j , respectively. And the dissimilarities shown in (5) and (7) by using qi2 are represented as d˜i2j , di∗2j , respectively. Each dissimilarity was used to apply the multidimensional scaling in Step 5 of the algorithm described in Sect. 4. Table 3 shows the percentage of correct solutions of data of label 6 when only CNN is used, and the percentages of correct solutions of results of the proposed classification method described in Sect. 4 by using each dissimilarity. In Table 3,
A Classification Method Based on Ensemble Learning …
389
Table 3 Comparison of results between CNN and the proposed classification method in an image data for label 6 d˜ 1 d˜ 2 CNN d ij d ∗1 d ∗2 ij
0.742
0.743
0.742
ij
0.741
ij
0.746
ij
0.749
“CNN” indicates the percentage of correct solutions when only CNN is used. The accuracy of the data of label 6 is worse than the data of other labels. In the cases of results of data for other labels, all the scores of accuracies are more than 0.82 and all of the cases for CNN, and other dissimilarities used, are almost the same for each label. Therefore, we only show the case of data of label 6. From this table, we can see that the proposed method obtained a better result for the data of label 6 when we apply the weight qi2 .
6 Conclusions In this paper, an algorithm for an ensemble learning of deep learning and multidimensional scaling is proposed for a classification problem of large and complex data such as images and sensor data. In addition, to improve the accuracy of the multidimensional scaling results, we use a new dissimilarity considering weights of objects. The proposed method aims to improve the accuracy of data classification by removing a latent structure of data with low explanatory power as noise obtained by transforming the data into a space spanned by dimensions that have explainable power for the latent structure of the data by using the multidimensional scaling. Numerical examples using images and a sensor data show the effectiveness of the proposed method.
References 1. Alex, K., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: The 25th International Conference on Neural Information Processing Systems, Vol. 1, pp. 1097–1105 (2012) 2. Schindler, A., Lidy, T., Karner, S., Hecker, M.: Fashion and Apparel Classification using Convolutional Neural Networks, arXiv:1811.04374 (2018) 3. Kruskal, J.B., Wish, M.: Multidimensional Scaling. Sage Publications (1978) 4. Ito, K., Sato-Ilic, M.: Asymmetric dissimilarity considering the grade of attraction for intervalvalued data. In: International Workshop of Fuzzy Systems & Innovational Computing, pp. 362– 367 (2004) 5. Kunihiko, F.: Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36(4), 193–202 (1980) 6. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp. 2278–2324 (1998)
390
K. Miyazawa and M. Sato-Ilic
7. UCI Machine Learning Repository: MHEALTH Dataset Data Set. https://archive.ics.uci.edu/ ml/datasets/MHEALTH+Dataset. Last accessed 26 Aug 2020 8. Banos, O., Garcia, R., Holgado, J.A., Damas, M., Pomares, H., Rojas, I., Saez, A., Villalonga, C.: mHealthDroid: a novel framework for agile development of mobile health applications. In: Proceedings of the 6th International Work-conference on Ambient Assisted Living an Active Ageing (IWAAL 2014). Belfast, Northern Ireland, 2–5 Dec (2014) 9. Nguyen, L.T., Zeng, M., Tague, P., Zhang, J.: Recognizing new activities with limited training data. In: IEEE International Symposium on Wearable Computers (ISWC) (2015) 10. TensorFlow. https://www.tensorflow.org/. Last accessed 26 Aug 2020 11. Convolutional Neural Networks-TensorFlow. https://www.tensorflow.org/tutorials/ima ges/cnn. Last accessed 26 Aug 2020 12. Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747, (2017)
A Consistent Likelihood-Based Variable Selection Method in Normal Multivariate Linear Regression Ryoya Oda and Hirokazu Yanagihara
Abstract We propose a likelihood-based variable selection method for selecting explanatory variables in normality-assumed multivariate linear regression contexts. The proposed method is reasonably fast in terms of run-time, and it has a selection consistency when the sample size always tends to infinity, but the number of response and explanatory variables does not necessarily have to tend to infinity. It can be expected that the probability of selecting the true subset by the proposed method is high under a moderate sample size. Keywords High-dimension · Multivariate linear regression · Selection consistency
1 Introduction Multivariate linear regression with an n × p response matrix Y and an n × k explanatory matrix X is one of the fundamental methods of inferential statistical analysis, and it is introduced in many statistical textbooks (e.g., [10, 12]), where n is the sample size, and p and k are the numbers of response and explanatory variables, respectively. Let N = n − p − k + 1. We assume that N − 2 > 0 and rank(X) = k < n to ensure the existence of our proposed method in this paper. Let ω = {1, . . . , k} be the full set consisting of all the column indexes of X, and let X j be the n × kj matrix consisting of columns of X indexed by the elements of j ⊂ ω, where kj is the number of elements in j, i.e., kj = #(j). For example, if j = {1, 3}, then X j consists of the first and third column vectors of X. From the above notation, it holds that X = X ω . For R. Oda (B) School of Informatics and Data Science, Hiroshima University, 1-3-1 Kagamiyama, Higashi-Hiroshima, Hiroshima, Japan e-mail: [email protected] H. Yanagihara Graduate School of Advanced Science and Engineering, Hiroshima University, 1-3-1 Kagamiyama, Higashi-Hiroshima, Hiroshima, Japan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_33
391
392
R. Oda and H. Yanagihara
a subset j ⊂ ω, the multivariate linear regression model with Y and X j is expressed as follows: Y ∼ Nn×p (X j j , j ⊗ I n ),
(1)
where j is the kj × p matrix of regression coefficients and j is the p × p covariance matrix. In actual empirical contexts, it is important to examine which of the k explanatory variables affect the response variables, and this is regarded as the problem of selecting a subset of ω. To achieve this, it is common to search over all candidate subsets, and a variable selection criterion (SC) is often used to choose an optimal subset following this search. Akaike’s information criterion (AIC) [1, 2] ˆ j be the maximum is the most widely applied tool in this respect. For j ⊂ ω, let likelihood estimator of j in (1), which is defined by ˆ j = n−1 Y (I n − P j )Y, where P j is the projection matrix to the subspace spanned by the columns of X j , i.e., P j = X j (X j X j )−1 X j . A generalized information criterion (GIC) [7] for j ⊂ ω is defined by p(p + 1) ˆ GIC(j) = n log | j | + np(log 2π + 1) + α pkj + , 2 where α is a positive constant expressing a penalty for the complexity of (1). Specifying α, the GIC includes several criteria as special cases, e.g., the AIC (α = 2), the Bayesian information criterion (BIC) [11] (α = log n) and the Hannan–Quinn information criterion (HQC) [5] (α = 2 log log n). Recently, there has been significant attention in the literature to statistical methods for high-dimensional data. In high-dimensional data contexts in which the number of explanatory variables is substantial, it may take an inordinate amount of time to identify an optimal subset by calculating variable selection criteria for all the candidate subsets. For such practical reasons, we focus on a selection method based on an SC. Let ω be the complement set of {} for ω, i.e., ω = ω\{}. Then, the following optimal subset using an SC is presented: { ∈ ω | SC(ω ) > SC(ω)},
(2)
where SC(j) is the value of an SC for j ⊂ ω. The method (2) is introduced by Zhao et al. [13] and is referred to as the kick-one-out method by [3]. In particular, the optimal subset by the method (2) based on the GIC is expressed as ˆ ω ˆ −1 ˆj = { ∈ ω | GIC(ω ) > GIC(ω)} = { ∈ ω | n log | ω | > pα}.
(3)
A Consistent Likelihood-Based Variable Selection Method . . .
393
The method (3) can be regarded as a likelihood-based selection method. In multivariate linear regression contexts, the method (3) is used by [3, 9]. Moreover, the method (2) based on the generalized Cp (GCp ) criterion [6] is used by [3, 8]. The GCp criterion consists of the sum of a penalty term and a weighted residual sum of squares. It is well known that both a likelihood and a weighted residual sum of squares can measure the goodness of fit of a model. Therefore, the GIC or the GCp criterion is often used for selecting explanatory variables in multivariate linear regression contexts. In this paper, we assume that Y is generated from the following true model for a true subset j∗ : Y ∼ Nn×p (X j∗ ∗ , ∗ ⊗ I n ),
(4)
where ∗ is the kj∗ × p matrix of true regression coefficients wherein the row vectors are not zeros and ∗ is the p × p true covariance matrix which is positive definite. It is often hoped that the true subset j∗ should be specified. The selection consistency P(ˆj = j∗ ) → 1 is known as an asymptotic property reflecting the hope, and the selection consistency is one of important properties of variable selection methods. Therefore, in this paper, we obtain conditions of α for the selection consistency of the likelihood-based selection method (3) under the following high-dimensional (HD) asymptotic framework: HD : n → ∞,
p+k → c ∈ [0, 1). n
(5)
Moreover, by deciding a value of α satisfying the obtained conditions, we propose a variable selection method using (3), which will be reasonably fast in terms of run-time and has the selection consistency P(ˆj = j∗ ) → 1. In general, the probability of selecting the true subset j∗ by a selection method which has the selection consistency tends to be high for suitable data for the used asymptotic framework. Note that the HD asymptotic framework can be also expressed as follows: n → ∞, p/n → d1 ∈ [0, 1), k/n → d2 ∈ [0, 1 − d1 ), kj∗ /n → d3 ∈ [0, d2 ]. The HD asymptotic framework means that n always tends to infinity, but p, k and kj∗ do not necessarily have to tend to infinity. This is a suitable asymptotic framework for data satisfying the following five situations under n − p − k > 1: only n is large; n and p are large; n and k are large; n, p and k are large; n, p, k and kj∗ are large. Hence, it is expected that the probability of selecting the true subset j∗ by the proposed method is high under moderate sample sizes even when p, k and kj∗ are large. As related researches in high-dimensional contexts, [8] examined the selection consistency of (2) based on the GCp criterion under the HD asymptotic framework. However, [8] has not dealt with the likelihood-based selection method (3) (i.e., (2) based on the GIC). Both the GIC and the GCp criterion are well known in many fields, and either the GIC or the GCp criterion is used for selecting variables, depending on researchers’ preference. Therefore, it is important to provide not only (2) based on the GCp criterion but also the likelihood-based selection method which have the selection
394
R. Oda and H. Yanagihara
consistency under the HD asymptotic framework. When Y is generated from a nonnormal distribution, [3] also examined strong selection consistencies of (3) and (2) based on the GCp criterion under the HD asymptotic framework with the exception that d1 = 0, d2 = 0 and kj∗ → ∞. Note that the strong selection consistency means that P(ˆj → j∗ ) = 1 holds and is stronger than the selection consistency P(ˆj = j∗ ) → 1. However, we cannot know that the number of true variables kj∗ is large or small from actual data. Therefore, it is also important to use the HD asymptotic framework which includes both cases that kj∗ tends to infinity and kj∗ is fixed. Our proposed selection method is the likelihood-based selection method, and it has the selection consistency under the HD asymptotic framework which includes the cases of kj∗ → ∞ and kj∗ : fixed. The remainder of the paper is organized as follows. In Sect. 2, we obtain conditions of α for the selection consistency of (3) and propose a consistent selection method by using the obtained conditions. In Sect. 3, we conduct numerical experiments for verification purposes. Technical details are relegated to the Appendix.
2 Proposed Selection Method First, we prepare notation and assumptions to obtain conditions of α for the selection consistency of (3) under the HD asymptotic framework. The following three assumptions are prepared: Assumption 1 The true subset j∗ is included in the full set ω, i.e., j∗ ⊂ ω. Assumption 2 There exists c1 > 0 such that n−1 min x (I n − P ω )x ≥ c1 , ∈j∗
where x is the -th column vector of X. Assumption 3 There exist c2 > 0 and c3 ≥ 1/2 such that n1−c3 min θ −1 ∗ θ ≥ c2 , ∈j∗
where θ is the -th row vector of ∗ . Assumption 1 is needed to consider the selection consistency. Assumptions 2 and 3 concern asymptotic restrictions for the explanatory variables and parameters in the true model, respectively. These assumptions are also used by [8]. Assumption 2 holds if the minimum eigenvalue of n−1 X X is bounded away from 0, and Assumption 3 allows the minimum value of θ −1 ∗ θ to vanish to 0 at a slower speed than or equal to the divergence speed of n1/2 . For ∈ ω, let the p × p non-centrality matrix and parameter be denoted by
A Consistent Likelihood-Based Variable Selection Method . . .
= −1/2 ∗ X j∗ (I n − P ω )X j∗ ∗ −1/2 , δ = tr( ). ∗ ∗
395
(6)
/ j∗ under It should be emphasized that = Op,p and δ = 0 hold if and only if ∈ Assumption 1, where Op,p is the p × p matrix of zeros. Next, we obtain conditions of α for the selection consistency of (3), and we propose a variable selection method by deciding a valve of α satisfying the obtained ˆ −1 ˆ ω conditions. The following lemma is prepared to examine the distribution of | ω | (the proof is given in Appendix 1): Lemma 1 Let N = n − p − k + 1 and δ be defined by (6). For ∈ ω, let u and v be independent random variables distributed according to u ∼ χ 2 (p; δ ) and v ∼ χ 2 (N ), respectively, where χ 2 (p; δ ) denotes the non-central chi-square distribution with degrees of freedom p and non-centrality parameter δ , and χ 2 (N ) expresses χ 2 (N ; 0). Then, under Assumption 1, we have −1
ˆ ω ˆω |=1+ |
u . v
By using Lemma 1, conditions of α for the selection consistency of (3) are obtained in Theorem 1 (the proof is given in Appendix 2). Theorem 1 Suppose that Assumptions 1, 2, and 3 hold. Then, the selection method (3) has the selection consistency under the HD asymptotic framework (5), i.e., P(ˆj = j∗ ) → 1 holds, if for some r ∈ N the following conditions are satisfied: α=
√ p p p n p log 1 + +β , β > 0, s.t. 1/2r β → ∞, c β → 0. (7) p N −2 N −2 k n3
By using the result of Theorem 1, we propose the consistent likelihood-based selection method (3) with the following value of α: α = α˜ =
√ k 1/4 p log n p n log 1 + + . p N −2 N −2
(8)
From (7), it is straightforward that the selection method (3) with α = α˜ has the selection consistency under the HD asymptotic framework when c3 > 3/4. Note that if k 1/4 in α˜ is be replaced by k 1/2r for a sufficiently large r ∈ N, then the selection method (3) with α = α˜ has the selection consistency when c3 > 1/2. However, even though we use k 1/2r for a sufficiently large r instead of k 1/4 , α = α˜ is not satisfied with the conditions (7) when c3 = 1/2. This comes from the fact that we have to decide a value of r in order to use the selection method in actual data. In this paper, we use k 1/4 as an example of α satisfying the conditions (7). Next, we present an efficient calculation for high-dimensional data. When at least ˆ −1 ˆ ω one of p and k is large, | ω | in (3) should not be calculated simply, because ˆ ω for each ˆ ω is p × p and moreover P ω must be calculated to derive the size of
396
R. Oda and H. Yanagihara
ˆ ω ˆ −1 ∈ ω. For such a reason, we present another expression of | ω |. For ∈ ω, let r be the (, )-th element of (X X)−1 and let z be the -th column vector of X(X X)−1 . In accordance with [8], it holds that P ω − P ω = r−1 z z . Hence, it is ˆ ω ˆ −1 straightforward that | ω | can be expressed as −1
−1
ˆ ω ˆ ω Y z . ˆ ω | = 1 + n−1 r−1 z Y |
(9)
The expression (9) implies the fact that in order to perform the selection method ˆ −1 (3), if (X X)−1 , X(X X)−1 and ω are calculated only once at the beginning, we do not need the calculations of determinants and inverse matrices after that. Then, it is expected that the run-time of the selection method (3) is fast. Therefore, we recommend perform (3) using the right-hand side of (9). Finally, we mention the following relationship between the likelihood-based selection method (3) and the method (2) based on the GCp criterion. Remark 1 (Relationship with the GCp criterion) The difference between the GCp criterion for ω ( ∈ ω) and that for ω is defined as −1
ˆ ω ˆ ω ) − pγ , GCp (ω ) − GCp (ω) = (n − k)tr( where γ is a positive constant. From Lemma A.1 in [8], it is known that the equation −1 ˆ −1 ˆ ω tr( ω ) = u v holds, where u and v are defined in Lemma 1. This fact implies that the likelihood-based selection method (3) can be regarded as equivalent to (2) based on the GCp criterion when α and γ are adjusted adequately. Especially, the proposed method (3) with α = α˜ can be regarded as nearly identical to the method in [8].
3 Numerical Studies We present numerical results to compare the probabilities of selecting the true subset j∗ by the proposed method (3) with α = α˜ in (8) and the two methods (3) based on the AIC and BIC (α = 2, log n). Moreover, we present the run-time associated with executing the proposed method (3) with α = α˜ in (8). The probabilities and runtime were calculated by Monte Carlo simulations with 10, 000 iterations executed in MATLAB 9.6.0 on an Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz 3601 Mhz, 8 cores, 16 logical processors and 64 GB of RAM. We set the true subset and the number of the true explanatory variables as j∗ = {1, . . . , kj∗ } and kj∗ = k/2. The data Y was generated by the true model (4), and X and the true parameters were determined as follows: X ∼ Nn×k (On,k , ⊗ I n ), ∗ = 1kj∗ 1p , ∗ = 0.4{(1 − 0.8)I p + 0.81p 1p }, where the (a, b)-th element of is (0.5)|a−b| and 1p is the p-dimensional vector of ones.
A Consistent Likelihood-Based Variable Selection Method . . .
397
Table 1 Selection probabilities (%) of j− , j∗ and j+ by the proposed method (3) with α = α˜ in (8) and the two methods (3) based on the AIC and BIC n
p
k
Proposed
AIC
BIC
j−
j∗
j+
j−
j∗
j+
j−
j∗
j+
100
10
10
0.00
99.84
0.16
0.00
70.05
29.95
0.00
99.97
0.03
300
10
10
0.00
99.98
0.02
0.00
82.10
17.90
0.00
100.00
0.00
500
10
10
0.00
100.00
0.00
0.00
83.66
16.34
0.00
100.00
0.00
800
10
10
0.00
100.00
0.00
0.00
84.94
15.06
0.00
100.00
0.00
1000
10
10
0.00
100.00
0.00
0.00
85.76
14.24
0.00
100.00
0.00
3000
10
10
0.00
100.00
0.00
0.00
86.13
13.87
0.00
100.00
0.00
100
80
10
58.90
29.26
11.84
0.00
0.00
100.00
98.27
1.67
0.06
300
240
10
0.34
94.87
4.79
0.00
0.12
99.88
100.00
0.00
0.00
500
400
10
0.00
97.68
2.32
0.00
0.29
99.71
100.00
0.00
0.00
800
640
10
0.00
99.21
0.79
0.00
0.50
99.50
100.00
0.00
0.00
1000
800
10
0.00
99.42
0.58
0.00
0.60
99.40
100.00
0.00
0.00
3000
2400
10
0.00
99.96
0.04
0.00
0.91
99.09
100.00
0.00
0.00
100
10
80
99.85
0.11
0.04
0.00
0.00
100.00
1.31
0.00
98.69
300
10
240
96.02
3.98
0.00
0.00
0.00
100.00
0.00
0.00
100.00
500
10
400
42.73
57.27
0.00
0.00
0.00
100.00
0.00
0.00
100.00
800
10
640
0.02
99.98
0.00
0.00
0.00
100.00
0.00
0.00
100.00
1000
10
800
0.02
99.98
0.00
0.00
0.00
100.00
0.00
0.00
100.00
3000
10
2400
0.00
100.00
0.00
0.00
0.00
100.00
0.00
0.00
100.00
100
40
40
90.19
8.88
0.93
0.00
0.00
100.00
81.44
15.72
2.84
300
120
120
80.70
19.25
0.05
0.00
0.00
100.00
100.00
0.00
0.00
500
200
200
49.99
50.01
0.00
0.00
0.00
100.00
100.00
0.00
0.00
800
320
320
12.10
87.90
0.00
0.00
0.00
100.00
100.00
0.00
0.00
1000
400
400
4.48
95.52
0.00
0.00
0.00
100.00
100.00
0.00
0.00
3000
1200
1200
0.00
100.00
0.00
0.00
0.00
100.00
100.00
0.00
0.00
Table 1 shows the selection probabilities. Therein, j− and j+ denote the underspecified and overspecified subsets of ω satisfying j− ∩ j∗ = ∅ and j+ j∗ , respectively. From Table 1, we observe that the proposed method appears to have the selection consistency P(ˆj = j∗ ) → 1 under the HD asymptotic framework. However, the method (3) based on the AIC tends toward selecting overspecified subsets j+ . The method (3) based on the BIC seems to have the selection consistency P(ˆj = j∗ ) → 1 when only n tends to infinity. On the other hand, the probabilities by the proposed method tend to be low for some combinations (n, p, k). The main reason is that ensuring the selection consistency requires n − p − k → ∞ from the HD asymptotic framework. Therefore, the probabilities may be low when n − p − k is not large. Moreover, the proposed method tends to select underspecified subsets when n and k are large but n − p − k is not large, i.e., (n, p, k) = (300, 10, 240), (500, 10, 400), (300, 120, 120), (500, 10, 400), (500, 200, 200), (800, 320, 320). This is because k 1/4 /(N − 2) in α˜ is large when n − p − k is not large. If k 1/4 is replaced by k 1/2r for a natural number
398
R. Oda and H. Yanagihara
Fig. 1 Base-10 logarithms of the run-times associated with executing the proposed method (3) with α = α˜ in (8)
r (r > 2), it can be expected that the true subset tends to be selected. However, we do not verify it in this paper. Figure 1 shows the base-10 logarithms of run-time by the proposed method (3) with α = α˜ in (8). From Fig. 1, we observe that the run-times by the proposed method are fast (about 24 seconds at its slowest, i.e., when n = 3000, p = 1200, k = 1200). Appendix 1: Proof of Lemma 1 Y (P ω − This proof is based on that of Lemma A.1 in [8]. For ∈ ω, let W = −1/2 ∗ −1/2 −1/2 −1/2 P ω )Y ∗ and W = ∗ Y (I n − P ω )Y ∗ . Then, from a property of the noncentral Wishart distribution and Cochran’s theorem (e.g., [4]), we can see that W and W are independent and W ∼ Wp (1, I p ; ) and W ∼ Wp (n − k, I p ), where is defined by (6), Wp (1, I p ; ) denotes the non-central Wishart distribution with degrees of freedom 1, covariance matrix I p and non-centrality matrix , and Wp (n − k, I p ) expresses Wp (n − k, I p ; 0). Since the rank of is 1, we decompose as = η η by using a p-dimensional vector η . By using η , we can express W as W = (ε + η )(ε + η ) , where ε ∼ Np×1 (0p , 1 ⊗ I p ) and ε is independent of W , where 0p is the p-dimensional vector of zeros. Then, we have −1
ˆ ω ˆ ω | = |I p + W W −1 | = 1 + (ε + η ) W −1 (ε + η ). | Let u and v be constants as follows: u = (ε + η ) (ε + η ), v =
(ε + η ) (ε + η ) . (ε + η ) W −1 (ε + η )
(10)
A Consistent Likelihood-Based Variable Selection Method . . .
399
Then, from a property of the Wishart distribution, we can state that u and v are ˆ ω ˆ −1 independent, u ∼ χ 2 (p; δ ) and v ∼ χ 2 (N ). Using (10), | ω | is expressed as −1 −1 ˆ ˆ | ω ω | = 1 + u v . Therefore, the proof of Lemma 1 is completed. Appendix 2: Proof of Theorem 1 To prove Theorem 1, we use the the following results about the divergence orders of central moments, which is seen in [8, Lemma A.2]. Lemma 2 ([8]) Let δ be a positive constant. And let t1 , t2 and v be random variables distributed according to χ 2 (p), χ 2 (p; δ) and χ 2 (N ), respectively, where t1 and t2 are independent of v. Then, under the HD asymptotic framework (5), for N − 4r > 0 (r ∈ N), we have E E
t1 p − v N −2 t2 p+δ − v N −2
2r 2r
= O(pr n−2r ), = O(max{(p + δ)r n−2r , (p + δ)2r n−3r }).
−1 ˆ ω ˆ −1 From Lemma 1, it holds that | ω | = 1 + u v , where u and v are independent random variables distributed according to u ∼ χ 2 (p; δ ) and v ∼ χ 2 (N ), respectively. The upper bound of P(ˆj = j∗ ) is expressed as
P(ˆj = j∗ ) ≤
−1
ˆ ω ˆ ω | ≥ pα) + P(n log |
∈j /∗
−1
ˆ ω ˆ ω | ≤ pα). P(n log |
∈j∗
Hence, we should prove that the right-hand side of the above inequality tends to 0 under the HD asymptotic framework. First, we consider the case of ∈ / j∗ . Note / j∗ . Let ρβ = pβ(N − 2)−1 . Then, using Markov’s that = Op,p , i.e., δ = 0 for ∈ inequality and (7), for any r ∈ N, we have u p ≥ ρβ P − v N −2 ∈j /∗ ∈j /∗ 2r u1 u1 p p −2r ≥ ρβ ≤ kρβ E ≤ (k − kj∗ )P − − . v1 N − 2 v1 N −2
−1
ˆ ω ˆ ω | ≥ pα) = P(n log |
From Lemma 2, the moment in the above inequality has O(pr n−2r ). Hence, we have
−1
ˆ ω ˆ ω | ≥ pα) = O(kp−r β −2r ) = o(1). P(n log |
(11)
∈j /∗
Next, we consider the case of ∈ j∗ . Note that δ > 0 for ∈ j∗ . Let δmin = min∈j∗ δ . Then, from Assumptions 2 and 3, the inequality n−c3 δmin ≥ c1 c2 holds
400
R. Oda and H. Yanagihara
−1 because of δmin ≥ min∈j∗ x (I n − P ω )x θ −1 ∗ θ . Hence, we have ρβ δmin = o(1) from (7). Then, using Markov’s inequality, for sufficiently large N and any r ∈ N, we have u p + δ δ ˆ ω ˆ −1 ≤ ρ P(n log | | ≤ pα) = P − − β ω v N −2 N −2 ∈j∗ ∈j∗ u p + δ δ ≤ P − ≥ −ρβ + N − 2 v N − 2 ∈j∗ −2r u δ p + δ 2r ≤ kj∗ max −ρβ + E − . (12) ∈j∗ N −2 v N −2
From Lemma 2, the maximum value except for constant parts of the above moment is (p + δ )r n−2r or (p + δ )2r n−3r . Hence, for sufficiently large r ∈ N, we have kj∗ max −ρβ + ∈j∗
δ N −2
−2r
N −2 ≤ kj∗ max 1 − ρβ ∈j∗ δ
{( p + δ )r n−2r + (p + δ )2r n−3r }
−2r
−1 r −r −1 2r −r 1 + pδmin δmin + 1 + pδmin n
−r −2r −2r ) + O(kj∗ pr δmin ) + O(kj∗ n−r ) + O(kj∗ p2r n−r δmin ). = O(kj∗ δmin
(13)
We can see that (13) tends to 0 under the HD asymptotic framework if c3 > (1 + r −1 )/2. Since r is arbitrary, the inequality c3 > (1 + r −1 )/2 is equivalent to c3 ≥ 1/2. Hence, from (12) and (13), we have
ˆ ω ˆ −1 P(n log | ω | ≤ pα) = o(1).
(14)
Therefore, (11) and (14) completes the proof of Theorem 1.
∈j∗
Acknowledgements This work was supported by funding from JSPS KAKENHI (grant numbers JP20K14363, JP20H04151, and JP19K21672 to Ryoya Oda; and JP16H03606, JP18K03415, and JP20H04151 to Hirokazu Yanagihara).
References 1. Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csáki), F. (eds.) 2nd International Symposium on Information Theory pp. 995– 1010. Akadémiai Kiadó, Budapest (1973). https://doi.org/10.1007/978-1-4612-1694-0_15 2. Akaike, H.: A new look at the statistical model identification. In: Institute of Electrical and Electronics Engineers. Transactions on Automatic Control, vol. AC-19, pp. 716–723 (1974). https://doi.org/10.1109/TAC.1974.1100705
A Consistent Likelihood-Based Variable Selection Method . . .
401
3. Bai, Z.D., Fujikoshi, Y., Hu, J.: Strong consistency of the AIC, BIC, Cp and KOO methods in high-dimensional multivariate linear regression. TR No. 18–9, Statistical Research Group, Hiroshima University (2018) 4. Fujikoshi, Y., Ulyanov, V.V., Shimizu, R.: Multivariate Statistics: High-Dimensional and LargeSample Approximations. Wiley, Hoboken, NJ (2010) 5. Hannan, E.J., Quinn, B.G.: The determination of the order of an autoregression. J. Roy. Stat. Soc. Ser. B 26, 270–273 (1979). https://doi.org/10.1111/j.2517-6161.1979.tb01072.x 6. Nagai, I., Yanagihara, H., Satoh, K.: Optimization of ridge parameters in multivariate generalized ridge regression by plug-in methods. Hiroshima Math. J. 42, 301–324 (2012). https://doi. org/10.32917/hmj/1355238371 7. Nishii, R., Bai, Z.D., Krishnaiah, P.R.: Strong consistency of the information criterion for model selection in multivariate analysis. Hiroshima Math. J. 18, 451–462 (1988). https://doi. org/10.32917/hmj/1206129611 8. Oda, R., Yanagihara, H.: A fast and consistent variable selection method for high-dimensional multivariate linear regression with a large number of explanatory variables. Electron. J. Stat. 14, 1386–1412 (2020). https://doi.org/10.1214/20-EJS1701 9. Sakurai, T., Fujikoshi, Y.: High-dimensional properties of information criteria and their efficient criteria for multivariate linear regression models with covariance structures. TR No. 17–13, Statistical Research Group, Hiroshima University (2017) 10. Srivastava, M.S.: Methods of Multivariate Statistics. Wiley, New York (2002) 11. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978). https://doi. org/10.1214/aos/1176344136 12. Timm, N.H.: Appl. Multivar. Anal. Springer, New York (2002) 13. Zhao, L.C., Krishnaiah, P.R., Bai, Z.D.: On detection of the number of signals in presence of white noise. J. Multivar. Anal. 20, 1–25 (1986). https://doi.org/10.1016/0047259X(86)90017-5
A Hybrid Method of Multi-class SVM and Classification Method Based on Reliability Score for Autocoding of the Family Income and Expenditure Survey Yukako Toko
and Mika Sato-Ilic
Abstract The classification of text descriptions based on corresponding classes is an important task in official statistics. We developed a hybrid method of SVM utilizing Word2Vec and the previously developed reliability–score-based classifier to improve both the ability of high classification accuracy and generalization performance. However, in the previous study, as SVM was simply applied to a whole given data, there is room for more efficiently classifying those data to improve the classification accuracy. Therefore, this paper proposes a classification method based on multi-class SVM that is a combined method of SVM and k-means method to improve the ability of high classification accuracy. The numerical example shows the proposed method gives a better result as compared to the results of ordinary methods. Keywords Text classification · Coding · Word2Vec · Support vector machine · k-means · Reliability score
1 Introduction In official statistics, text response fields are often found in survey forms. Those respondents’ text descriptions are usually translated into corresponding classes (or codes) for efficient data processing. Coding tasks are originally performed manually, whereas the studies of automated coding have made progress with the improvement of computer technology in recent years. For example, Hacking and Willenborg [1] introduced coding methods, including autocoding. Gweon et al. [2] illustrated Y. Toko (B) · M. Sato-Ilic National Statistics Center, 19-1 Wakamatsu-cho, Shinjuku-ku, Tokyo 162-8668, Japan e-mail: [email protected] M. Sato-Ilic e-mail: [email protected] M. Sato-Ilic Faculty of Engineering, Information and Systems, University of Tsukuba, Tennodai 1-1-1, Tsukuba, Ibaraki 305-8573, Japan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_34
403
404
Y. Toko and M. Sato-Ilic
methods for automated occupation coding based on statistical learning. We also have developed several classification methods based on reliability scores [3, 4] considering humans’ uncertainty of recognizing the classifying of words for adjusting the complexity of data for the Family Income and Expenditure Survey in Japan [5]. In addition, considering the generalization for adapting the variability of obtained data, we have extended the simple reliability scores to generalized reliability scores considering robustness and generalization for adjusting various types of complex data [6, 7]. However, these methods are not enough to obtain better classification accuracy for a large amount of complex data even though data is getting complex and large in recent years. Therefore, there is a need for efficiently handling those data with higher accuracy of classification and generalization for obtaining robust classification results for various kinds of inputted text descriptions. In order to address this problem, we developed a classification method [8], which is a hybrid method of SVM [9] utilizing Word2Vec [10, 11] and our previously developed classification method based on the reliability score [3, 4, 6, 7] to improve both the ability of high classification accuracy and generalization performance. However, in the previous study, SVM is simply applied to a whole given data even though the data we handle is large and complex. Therefore, there is room for more efficient classification of those data with high accuracy. For this purpose, this paper utilizes a clustering method before applying SVM. That is, in order to capture significant features of data, we try to obtain individually specified characteristic features for each cluster. From this, we can discriminate a whole given data into several groups; for some of them, data within a group are highly discriminated by SVM and the others in which data of each group are not clearly classified by SVM. The merit of obtaining several groups before applying SVM is that we can apply SVM individually to each group considering each group’s discriminable ability. We combine such a function with the previously proposed hybrid method of SVM utilizing Word2Vec and our previously developed classification method based on the reliability score. So, this paper proposes a hybrid method of “multi-class SVM” utilizing Word2Vec and our previously developed classification method based on the reliability score. The rest of this paper is organized as follows: Word2Vec and SVM are explained in Sects. 2 and 3. The classification method based on reliability score is described in Sect. 4. The hybrid method of “multi-class SVM” utilizing Word2Vec and our previously developed classification method based on the reliability score is proposed in Sect. 5. The numerical examples are illustrated in Sect. 6. Conclusions are described in Sect. 7.
A Hybrid Method of Multi-class SVM and Classification Method …
405
2 Efficient Estimation of Word Representations in Vector Space (Word2Vec) Word2Vec was developed based on an idea of a neural probabilistic language model in which words are embedded in a continuous space by using distributed representations of the words [10, 11]. The Word2Vec algorithm learns word association from a given dataset utilizing a neural network model based on an idea of a neural probabilistic language model. It produces a vector space, and each word in the given dataset is assigned a corresponding numerical vector of a word in the produced vector space. The essence of the idea is to avoid the curse of dimensionality by distributed representations of words. Word2Vec utilizes a continuous bag-of-word (CBOW) model and continuous skip-gram model [10] to distributed representation on words. The CBOW model predicts the current word based on the context. The skip-gram model uses each current word to predict words within a certain range before and after the current word. It gives less weight to the distant context words. In our study, we applied the skip-gram model.
3 Support Vector Machine (SVM) Support vector machine (SVM) [9] is a supervised machine learning algorithm for classification and regression. It finds the maximum-margin hyperplane in highdimensional space for classification. If w is the weight vector realizing a functional margin of 1 on the positive point x + and negative point x − , a functional margin of 1 implies w · x + + b = +1, w · x − + b = −1, while w is normalized. Then, the margin γ is the functional margin of the resulting classifier w 1 w 1 1 + − = w · x+ − w · x− = γ = ·x − ·x . w2 w2 2 w2 2w2 Therefore, the resulting margin will be equal to 1/w2 and the following can be written. Given a linearly separable training sample S = ((x 1 , y1 ), (x 2 , y2 ), . . . , (x l , yl )) the hyperplane (w, b) that solves the optimization problem
406
Y. Toko and M. Sato-Ilic
minw · w,
(1)
w,b
subject to yi (w · x i + b) ≥ 1, i = 1, . . . , l, realizes the maximal margin hyperplane with geometric margin M = 1/w2 . Then, slack variables are introduced to allow the margin constraints to be violated subject to yi (w · x i + b) ≥ 1 − ξi , i = 1, . . . , l, ξi ≥ 0, i = 1, . . . , l, From the above, the optimization problem shown in (1) can be written as min w · w + C
ξ,w,b
l
ξi ,
(2)
i=1
subject to yi (w · x i + b) ≥ 1 − ξi , i = 1, . . . , l, ξi ≥ 0, i = 1, . . . , l, where the C is the cost parameter that will give the optimal bound as it corresponds to finding the minimum of ξ1 in (2) with the given value for w2 . This is soft-margin linear SVM. Also, SVM performs a nonlinear classification transforming input data into higherdimensional spaces and calculating the inner product between the data in higherdimensional space using kernel trick. SVM uses kernel functions to enable it to obtain the inner product of data in higher-dimensional space (kernel trick), which is represented as follows: k x i , x j = ϕ(x i )T ϕ(x j ), where ϕ is a mapping function from an observational space to a higher-dimensional space. The conditions for k x, x to be a kernel function are as follows: Symmetry: k x, x = k x , x . Gram matrix is positive semi-definite. There are many possible choices for the kernel function, such as • Radial basis function kernel 2 k x, x = exp −γ x − x , • Polynomial kernel
γ =
1 , 2σ 2
(3)
A Hybrid Method of Multi-class SVM and Classification Method …
407
d k x, x = c + x T x , (c > 0, d ∈ N), • Sigmoid kernel k x, x = tanh ax T x + c . In this paper, the radial basis function kernel is applied. The mapped feature space of this kernel function has an infinite number of dimensions. For multi-class SVM, there are two approaches: one-versus-the-rest and oneversus-one. In one-versus-the-rest, SVM builds binary classifiers that discriminate between one class and the rest, whereas it builds binary classifiers that discriminate between every pair of classes in one-versus-one.
4 Classification Method Based on Reliability Score Conventional classifier performs the extraction of objects and retrieval of candidate classes from the object frequency table provided by using the extracted objects. Then, it calculates the relative frequency of jth object to a class k defined as
n jk = ,nj = n jk , j = 1, . . . , J, k = 1, . . . , K , nj k=1 K
p jk
where n jk is the number of occurrence of statuses in which an object j is assigned to a class k in the training dataset. J is the number of objects and K is the number of classes. However, this classifier has difficulty correctly assigning classes to text descriptions for complex data included uncertainty. To address this problem, we developed the overlapping classifier that assigns classes to each text on the reli description based ability score [3, 4, 6, 7]. Then, the classifier arranges p j1 , · · · , p j K in descending orderand creates p˜ j1 , . . . , p˜ j K , such as p˜ j1 ≥ · · · ≥ p˜ j K , j = 1, · · · , J . After ˜ ˜ that, p˜ j1 , · · · , p˜ j K˜ , K˜ j ≤ K are created. That is, each object has a different j
number of classes. Then, the classifier calculates the reliability score for each class of each object. The reliability score of jth object to a class k is defined as ⎛
⎞
˜
p jk = T ⎝ p˜˜ jk , 1 +
Kj
p˜˜ jm log K p˜˜ jm ⎠, j = 1, . . . , J, k = 1, . . . , K˜ j .
(4)
m=1
⎛ p jk = T ⎝ p˜˜ jk ,
˜
Kj
m=1
⎞ p˜˜ 2jm ⎠, j = 1, . . . , J, k = 1, . . . , K˜ j .
(5)
408
Y. Toko and M. Sato-Ilic
These reliability scores were defined considering both probability measure and fuzzy measure [12, 13]. That is, p˜˜ jk shows the uncertainty from the training dataset K˜ j ˜ K˜ j ˜ 2 (probability measure) and 1 + m=1 p˜ jm log K p˜˜ jm or m=1 p˜ jm shows the uncertainty from the latent classification structure in data (fuzzy measure). These values of the uncertainty from the latent classification structure can show the classification status of each object; that is, how each object is classified to the candidate classes. T shows T-norm in statistical metric space [14–16]. We generalize the reliability score by using the idea of T-norm, which is a binary operator in statistical metric space. Furthermore, to prevent an infrequent object having significant influence, sigmoid were introduced to the reliability score. The reliability score considfunctions ering the frequency of each object over the classes for each object in the training dataset as follows [6, 7]: (6) In this study, algebraic product is taken as T-norm and n j / 1 + n 2j is taken as a sigmoid function for the reliability score.
5 A Hybrid Method of Multi-class SVM and Classification Method Based on Reliability Score We developed a classification method [8], which is a hybrid method of SVM utilizing Word2Vec and our previously developed classification method based on the reliability score, to improve both the ability of high classification accuracy and generalization performance. In order to improve the accuracy of the result of SVM in the previously proposed hybrid method, we utilize a clustering method to a whole data and propose a new hybrid method of multi-class SVM and classification method based on reliability score in which SVM applies to each obtained cluster individually. First, the proposed method obtains numerical vectors corresponding words utilizing Word2Vec after tokenizing each text description by MeCab [17]. Then, it produces sentence vectors for each text description based on the obtained numerical vectors. After that, it performs k-means clustering to sentence vectors to divide them into a specified number of datasets according to the result of k-means clustering. Then, it assigns corresponding classes by using a classifier obtained by SVM for each dataset. After that, the proposed algorithm performs re-classification based on the previously defined reliability score to unmatched text descriptions at the classification by SVM. The detailed algorithm of the proposed method is the following: Step 1. The proposed algorithm tokenizes each text description into words by MeCab.
A Hybrid Method of Multi-class SVM and Classification Method …
409
Step 2. It obtains numerical vectors corresponding to words utilizing Word2Vec: First, it produces a dataset concatenating all tokenized words consecutively. Then, it trains Word2Vec model using the produced dataset. Each unique word in the dataset will be assigned a corresponding numerical vector. We determine the followings by trial and error: • • • •
Type of model architecture: CBOW model or skip-gram model The number of vector dimensions The number of training iterations Window size of words considered by the algorithm
Step 3. It produces sentence vectors for each text description based on the obtained numerical vectors at Step 2: First, it obtains a corresponding numerical vector for each word in each text description from the trained Word2Vec model. Then, it calculates the sum of obtained numerical vectors for each text description as sentence vectors. Step 4. The k-means method is applied to the sentence vectors produced in Step 3 in order to classify the sentence vectors into c clusters. The number of clusters c is given in advance. Then, a whole given data with obtained sentence vectors is divided according to the result of k-means clustering. Step 5. It assigns corresponding classes (or codes) applying SVM for each dataset of each cluster: It trains a support vector machine for each dataset and then predicts a corresponding class (or code) for each target text description. For training a support vector machine, we determine the followings: • Cost parameter appeared in (2) as C • Kernel function to be applied • Gamma parameter appeared in (3) as γ if radial basis function kernel is applied as a kernel function • Type of methods: one-versus-the-rest or one-versus-one In this study, a radial basis function as a kernel function is applied. We apply the one-versus-one approach. Cost parameter C and gamma parameter γ are determined by grid search. Step 6. It extracts unmatched text descriptions in Step 5. Step 7. It implements re-classification based on the reliability score in (6) described in Sect. 4.
6 Numerical Example For the numerical example, we applied the proposed method to the Family Income and Expenditure Survey dataset. The Family Income and Expenditure Survey is a sampling survey related to household’s income and expenditure conducted by the Statistics Bureau of Japan [5]. This survey dataset includes short text descriptions
410
Y. Toko and M. Sato-Ilic
related to household’s daily income and expenditure (receipt item name and purchase items name in Japanese) and their corresponding classes (or codes). In this numerical example, the target data is only data related to household’s expenditure in the dataset. The total number of classes related to household’s expenditure is around 520 [18]. Approximately 810 thousand text descriptions were used for this numerical example. We used gensim [19] for training the Word2Vec model to obtain numerical vectors corresponding to words. For applying Word2Vec, we selected the skipgram model as a type of model architecture and set the number of vector dimensions as 100, the number of training iterations as 10,000, and the window size as 2. After obtaining numerical vectors corresponding to words applying Word2Vec, we produced sentence vectors for each text description based on the obtained numerical vectors. Then, we applied k-means method in scikit-learn [20] with the number of clusters as 10. Figure 1 visually shows the values of cluster centers of each cluster. From Fig. 1, it can be seen that each cluster has different data structure. Therefore, we split a dataset of sentence vectors into 10 datasets in accordance with the result of k-means clustering for efficiently applying SVM. We implemented multi-class SVM in Python utilizing existing libraries in scikit-learn. SVM was applied selecting the cost parameter C appeared in (2) and the gamma parameter γ appeared in (3) by grid search. Table 1 shows the classification accuracy of SVM for each dataset obtained from the result of k-means clustering. From Table 1, it is found the classification accuracy was differ among clusters. For example, data in clusters 1 and 5 were almost perfectly classified, whereas the classification accuracy of data in clusters 0, 4 and 9 were around 80%. As we correctly classified 70,206 out of 81,455 target text descriptions over cluster 0 through cluster 9, we extracted 11,249 (81,455–70,206) unmatched classified text descriptions, and implemented re-classification based on the reliability score to them.
Fig. 1 Values of cluster centers
A Hybrid Method of Multi-class SVM and Classification Method …
411
Table 1 Classification accuracy of SVM for each cluster Cluster
Number of text descriptions Total
Training
Accuracy
Evaluation
Correctly assigned
SVM Cost
Gamma
Cluster 0
4,568
4,111
457
375
0.821
30
0.0001
Cluster 1
37,454
33,708
3,746
3,719
0.993
100
0.0010
Cluster 2
137,157
123,441
13,716
12,068
0.880
30
0.0010
Cluster 3
148,585
133,726
14,859
12,909
0.869
10
0.0064
Cluster 4
38,003
34,202
3,801
3,082
0.811
10
0.0010
Cluster 5
31,288
28,159
3,129
3,116
0.996
100
0.0010
Cluster 6
275,852
248,266
27,586
22,929
0.831
90
0.0255
Cluster 7
47,421
42,678
4,743
4,243
0.895
90
0.0001
Cluster 8
48,332
43,498
4,834
4,093
0.847
100
0.0010
Cluster 9
45,831
41,247
4,584
3,672
0.801
10
0.0010
814,491
733,036
81,455
70,206
0.862
Total
Table 2 compares the classification accuracy of the proposed method, SVM, and previously developed classification method based on reliability scores [3, 4, 6, 7]. In this table, reliability score (a) is a reliability score shown in (6) in which we use p jk in (4), whereas reliability score (b) is a reliability score shown in (6) in which we use p jk in (5). From comparison between Tables 1 and 2, the accuracy score of the result of SVM applied to all data is 0.856 and the accuracy score of the result of the multi-class SVM Table 2 Comparison of classification accuracy of the proposed hybrid autocoding method and conventional methods Classification methods Number of text descriptions
Accuracy
Training
Evaluation
Correctly assigned
733,036
81,455
74,855
0.919
Classification by the proposed hybrid method [reliability score(b)]
74,851
0.919
Classification by SVM
69,713
0.856
Classification based on the reliability score (a)
72,317
0.888
Classification based on the reliability score (b)
72,362
0.888
Classification by the proposed hybrid method [reliability score(a)]
412
Y. Toko and M. Sato-Ilic
is 0.862. That is, 493 text descriptions are increased to be assigned correctly. Therefore, the addition of the process of pre-classification to the whole data improved the accuracy when compared with non-process of the pre-classification. From Table 2, it can be seen that the classification accuracy of the proposed hybrid method of multiclass SVM and a classification method based on reliability scores obtained a better result compared with both the classification method based on reliability scores and classification by simply applying SVM. In fact, the proposed hybrid method obtained the classification accuracy of 0.919, whereas the classification method based on reliability scores obtained 0.888 and simply applying SVM obtained 0.856, respectively. That is, 5142 text descriptions are increased to be assigned correctly by using the proposed method from the only use of SVM.
7 Conclusion This paper proposes a hybrid method of multi-class SVM and a classification method based on reliability scores for complex and large amounts of data to improve both the ability of high classification accuracy and generalization performance. This paper utilizes a clustering method before applying SVM to capture significant features of data, and based on the captured features, we apply SVM individually to each data included in the corresponding cluster. By including such a process of preclassification to a previously proposed hybrid method; SVM utilizing Word2Vec and our previously developed classification method based on the reliability scores, this paper proposes a hybrid method of “multi-class SVM” utilizing Word2Vec and our previously developed classification method based on the reliability scores. The numerical examples show a better performance of the proposed classification method based on multi-class SVM when compared with either classification by simply applied SVM and classification by previously developed reliability–scorebased classifier. In future work, we will classify the data included in clusters which have lower classification accuracy into sub-clusters, and then apply SVM individually to obtained sub-clusters to get a better result.
References 1. Hacking, W., Willenborg, L.: Method Series Theme: Coding; interpreting short descriptions using a classification, Statistics Methods. Statistics Netherlands (2012) Available at https://www.cbs.nl/en-gb/our-services/methods/statistical-methods/throughput/throughput/ coding. Last accessed 28 Jan 2021 2. Gweon, H., Schonlau, M., Kaczmirek, L., Blohm, M., Steiner, S.: Three methods for occupation coding based on statistical learning. J. Official Stat. 33(1), 101–122 (2017) 3. Toko, Y., Wada, K., Iijima, S., Sato-Ilic, M.: Supervised multiclass classifier for autocoding based on partition coefficient. In: Czarnowski, I., Howlett, R.J., Jain, L.C., Vlacic, L. (eds.) Smart Innovation, Systems and Technologies, vol. 97, pp. 54–64. Springer, Switzerland (2018)
A Hybrid Method of Multi-class SVM and Classification Method …
413
4. Toko, Y., Iijima, S., Sato-Ilic, M.: Overlapping classification for autocoding system. J. Rom. Stat. Rev. 4, 58–73 (2018) 5. Statistics Bureau of Japan: Outline of the Family Income and Expenditure Survey. Available at https://www.stat.go.jp/english/data/kakei/1560.html. Last accessed 29 Jan 2021 6. Toko, Y., Iijima, S., Sato-Ilic, M.: Generalization for Improvement of the reliability score for autocoding. J. Rom. Stat. Rev. 3, 47–59 (2019) 7. Toko, Y., Sato-Ilic, M.: Improvement of the training dataset for supervised multiclass classification. In: Czarnowski, I., Howlett, R.J., Jain, L.C. (eds.) Smart Innovation, Systems and Technologies, vol. 193, pp. 291–302. Springer, Singapore (2020) 8. Toko, Y., Sato-Ilic, M.: Efficient autocoding method in high dimensional space. Rom. Stat. Rev. 1, 3–16 (2021) 9. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press (2000) 10. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013) 11. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003) 12. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York (1981) 13. Bezdek, J.C., Keller J., Krisnapuram, R., Pal, N.R.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer Academic Publishers (1999) 14. Menger, K.: Statistical metrics. Proc. Natl. Acad. Sci. U.S.A. 28, 535–537 (1942) 15. Mizumoto, M.: Pictorical representation of fuzzy connectives, Part I: cases of T-norms, tConorms and averaging operators. Fuzzy Sets Syst. 31, 217–242 (1989) 16. Schweizer, S., Sklar, A.: Probabilistic Metric Spaces. Dover Publications, New York (2005) 17. Kudo, T., Yamamoto, K., Matsumoto, Y.: Applying conditional random fields to Japanese morphological analysis. In: The 2004 Conference on Empirical Methods in Natural Language Processing on proceedings, pp. 230–237. Barcelona, Spain (2004) 18. Statistics Bureau of Japan: Income and Expenditure Classification Tables (revised in 2020). Available at https://www.stat.go.jp/english/data/kakei/ct2020.html. Last accessed 29 Jan 2021 19. Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50 (2010) 20. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in python. JMLR 12, 2825–2830 (2011)
Individual Difference Assessment Method Based on Cluster Scale Using a Data Reduction Method Kazuki Nitta
and Mika Sato-Ilic
Abstract This paper proposes a new method for assessing individual differences in time series data from multiple subjects, based on the cluster scale using a data reduction method as a measure for quantitatively detecting differences over subjects. This evaluation method is based on a previously proposed method of measuring individual difference among subjects by a common cluster scale over the subjects. However, it is unique in that we can reduce significant amounts of data by extracting data that affect the obtained common clusters by defining a new distance between the target data and the common clusters using the idea of distance based on maximum likelihood estimation using fuzzy covariance. Also, we demonstrate the effectiveness of the proposed method by showing that measuring the data between different subjects on only the selected data on the common cluster scale yields similar results to those obtained when using all data before selection. Keywords Cluster scale · Individual difference · Data reduction
1 Introduction There is a need to analyze large-scale data, such as data obtained from multiple subjects. An example scenario would be time series data obtained from multiple subjects, such as temporal data obtained from sensors on human movements related to different daily sports activities. It is important to extract certain properties from these data through the subjects. In this case, if there are large individual differences in the data structure between subjects, it is challenging to extract systematic properties from the data. Therefore, it is important to detect quantitative differences between subjects in advance. K. Nitta Graduate School of Systems and Information Engineering, University of Tsukuba, Tennodai 1-1-1, Tsukuba, Ibaraki 305-8573, Japan M. Sato-Ilic (B) Faculty of Engineering, Information and Systems, University of Tsukuba, Tennodai 1-1-1, Tsukuba, Ibaraki 305-8573, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_35
415
416
K. Nitta and M. Sato-Ilic
In this paper, we propose a method for quantitatively detecting differences between subjects based on the cluster scale of big data using a data reduction method. In this time, using the idea of cluster scale, we measure the data between different subjects on a common cluster scale [1, 2]. This measurement is based on the correlation between commonly obtained clusters through different subjects and attributes of the data, which can be applied to big data. However, there are problems with long calculation times and instability of solutions when analyzing big data. Therefore, in this study, we propose a method of significantly reducing the number of data by extracting the data that affect the obtained common clusters. This is done by defining a new distance between the target data and the common clusters using the idea of distance based on maximum likelihood estimation with fuzzy covariance [3]. We also demonstrate our proposed method’s effectiveness by showing that measuring the data between different subjects on a common cluster scale with only the selected data gives similar results to those obtained when using all data before selection. The structure of this paper is as follows. First, Sect. 2 describes the detection metric for differences over subjects based on cluster scale [1, 2]. Next, Sect. 3 describes distance based on maximum likelihood estimation using fuzzy covariance [3]. In Sect. 4, we propose a method for evaluating individual differences based on the cluster scale for big data using the new data reduction method. In Sect. 5, we present numerical examples using sensor measurements in everyday activities with individual differences and show the proposed method’s effectiveness. Then, we conclude this paper in Sect. 6.
2 Detection Metric for Differences Over Subjects Based on Cluster Scale Suppose two different n × p matrices A1 and A2 which consist of independent p variables shown as follows: ⎛ (t) ⎞ a1r ⎜ .. ⎟ (t) (t) (t) (1) At = a1 , . . . , a p , ar = ⎝ . ⎠, r = 1, . . . , p, t = 1, 2. (t) anr where t represents the t th subject. We combine the two data to be compared, as in ˆ (2), to create a 2n × p super-matrix A: Aˆ =
A1 . A2
(2)
We apply a fuzzy clustering [4] to Aˆ and obtain the following result of the fuzzy clustering.
Individual Difference Assessment Method Based on Cluster Scale …
⎛ (t) ⎞ u 1k ⎜ .. ⎟ (t) (t) (t) Ut = u1 , . . . , u K , uk = ⎝ . ⎠, k = 1, . . . , K , t = 1, 2,
417
(3)
u (t) nk
(t) where u ik is a degree of belongingness of an object i to a fuzzy cluster k at a subject (t) t. In general, u ik satisfies the following conditions:
(t) u ik ∈ [0, 1],
K
(t) u ik = 1, i = 1, . . . , n, k = 1, . . . , K , t = 1, 2.
(4)
k=1
(1) (2) (2) Where, since u(1) and u are results mathematically the , . . . , u , . . . , u 1 1 K K same K clusters over the different subjects, through these clusters, these can be used as a common cluster scale. From (1) and (3), the detection metric for measuring differences over subjects based on cluster scale is proposed as follows [1, 2]: cl(t) ≡
K
abs(cor (u(t) k , al )), l = 1 . . . 2 p, t = 1, 2,
(5)
k=1
where abs(・) means an absolute value of ・, cor u(t) k , al means the correlation between u(t) k and al , and
(2) (1) (2) a(1) ≡ a1 , . . . a p , a p+1 , . . . a2 p . , . . . , a , a , . . . , a p p 1 1
Also, from (5), 0 ≤ cl(t) ≤ K .
3 Distance Based on Maximum Likelihood Estimation Using Fuzzy Covariance Considering variability in cluster shapes and variations in cluster densities, the distance measure, de2 (x i , v k ), based on the maximum likelihood estimation with fuzzy covariance is proposed as follows [3]: de2 (x i , v k ) =
[det(Fk )]1/2 exp (x i − v k )T Fk−1 (x i − v k )/2 , Pk
(6)
418
K. Nitta and M. Sato-Ilic
where v k is the centroid of the k th cluster resulting from fuzzy clustering [4], Pk is a priori probability of the k th cluster, and Fk is the fuzzy covariance matrix of the k th cluster defined as following Eqs. (7) and (8). Pk = n Fk =
i=1
n 1 h(k|x i ). n i=1
h(k|x i )(x i − v k )(x i − v k )T . N i=1 h(k|x i )
(7)
(8)
In (7), (8), h(k|x i ) is a probability of selecting the k th cluster given the i th object x i and it is defined as follows: h(k|x i ) =
1/d 2 (x i ,v k ) K e 2 . j=1 1/de ( x i ,v j )
4 Individual Differences Assessment Method Based on Cluster Scale for Big Data Using Data Reduction Method We define a 2 p × n super-matrix A˜ as follows:
A˜ =
A1T . A2T
(9)
Where AtT is the transposition matrix of At shown in (1) and is shown as follows: ⎛ (t) ⎞ a˜ 1i ⎜ .. ⎟ (t) (t) (t) T At = a˜ 1 , . . . , a˜ n , a˜ i = ⎝ . ⎠, i = 1, . . . , n, t = 1, 2.
(10)
a˜ (t) pi
Using a method described in Sect. 2, we apply a fuzzy clustering to A˜ shown in (9) and obtain the degree of belongingness U˜ t shown as follows:
˜ (t) ˜ (t) U˜ t = u˜ (t) 1 ,...,u K ,u k
⎞ u˜ (t) 1k ⎜ . ⎟ = ⎝ .. ⎠, k = 1, . . . , K , t = 1, 2. ⎛
(11)
u˜ (t) pk
The distance between the i th object and the k th cluster, for each subject t, is defined as follows by using an idea of distance based on the maximum likelihood
Individual Difference Assessment Method Based on Cluster Scale …
419
estimation with fuzzy covariance shown in (6):
de2
˜ i(t) u˜ (t) k ,a
=
1/2 det F˜k(t) P˜k(t)
exp
u˜ (t) k
−
a˜ i(t)
T
−1
F˜k(t)
u˜ (t) k
−
a˜ i(t)
/2 . (12)
In (12), P˜k(t) is a weight of salience for the kth cluster of the t th subject, and F˜k(t) is the fuzzy covariance matrix for the k th cluster of the t th subject, we define them as follows: P˜k(t) = F˜k(t)
p
=
r =1
1 p
p
u˜ r(t)k .
r =1 T ˜ i(t) u˜ (t) ˜ i(t) u˜ r(t)k u˜ (t) k −a k −a p ˜ r(t)k r =1 u
.
Additionally, we define the following gi(t) using the distance shown in (12) as a measure of how far the ith object of the tth subject is from all the clusters, that is, how well characterized it is: gi(t) =
K ˜ i(t) ), i = 1, . . . , n, t = 1, 2. (de2 u˜ (t) k ,a
(13)
k=1
Then, when ξ1 , ξ2 are sufficiently large constants, we obtain the data At shown as follows by using (13): ⎛ (t) ⎞ a1r ⎟ (t) (1) ⎜ (t) (t) (t) |(gs > ξ1 , or gs(2) > ξ2 ), A¯ t = a1 , . . . , a p , ar = ⎝ ... ⎠, (asr
(t) awr
s = 1, . . . , w, r = 1, . . . , p, t = 1, 2.
(14)
In (14), when ξ1 , ξ2 are sufficiently large constants, then w n, thus allowing for a significant reduction in data. Furthermore, using the data shown in (14), we create .. the following super-matrix A: ..
A=
A1 . A2
(15) ..
Using the method described in Sect. 2, we apply a fuzzy clustering to A shown in (15) and obtain the degree of belongingness U t shown as follows:
420
K. Nitta and M. Sato-Ilic
(t) (t) U t = u(t) 1 , . . . , u K , uk
⎞ u (t) 1k ⎟ ⎜ = ⎝ ... ⎠, k = 1, . . . , K , t = 1, 2. ⎛
(16)
u (t) wk
By using (14) and (16), from the definition of (5), the relationship between the t th subject and the l th variable through the common cluster scale is obtained as follows: cl(t) ≡
K
abs(cor (u(t) k , al )), l = 1 . . . 2 p, t = 1, 2.
(17)
k=1
Where,
(2) (1) (2) ≡ a1 , . . . , a p , a p+1 , . . . , a2 p . a(1) 1 , . . . , a p , a1 , . . . , a p
5 Numerical Examples We use the temporal data [5, 6] obtained from sensors on human movements related to 19 different daily and sports activities. The data was recorded on eight subjects (P1P8) of four males and four females, aged 20–30 years, with nine sensors attached to five body parts for five minutes. The five body parts are torso (T), right arm (RA), left arm (LA), right leg (RL) and left leg (LL). The nine sensors are x-axial accelerometers (x acc ), y-axial accelerometers (yacc ), z-axial accelerometers (zacc ), xaxial gyroscopes (x gyro ), y-axial gyroscopes (ygyro ), z-axial gyroscopes (zgyro ), x-axial axial magnetometers (x mag ), y-axial magnetometers (ymag ) and z-axial magnetometers (zmag ). From the 19 different exercises, we treat the first 50 s of the activity of running on a treadmill at a speed of 8 km/h, 1250 Hz and 45 variables. Figure 1 shows the fuzzy clustering result shown in (3), where the subject P1’s 1250 × 45 data matrix is A1 in (1) and the subject P2’s 1250 × 45 data matrix is
Fig. 1 Results of the degree of belongingness Ut for subjects P1 and P2
Individual Difference Assessment Method Based on Cluster Scale …
421
(t)
Fig. 2 cl for subjects P1 and P2
A2 in (1), and after normalizing them, respectively, the fuzzy c-Means method [4] is applied. Where the number of clusters is assumed to be 3, the parameter to adjust for the fuzziness of belongingness is assumed to be 1.35. Figure 2 shows the values of the relationship between subjects P1, P2, and variables shown in (5), calculated using the common cluster scale obtained from the degree of belongingness result shown in Fig. 1 and the data shown in (1). In this figure, the horizontal axis represents the 45 variables for each of P1 and P2 in the time series data, and the vertical axis represents the values of cl(t) shown in (5) for that variable. The black line graph shows the first subject P1, and the red line graph shows the second subject P2. In Fig. 2, the values of cl(t) show that the first 45 variables have a high relationship only with the degree of belongingness U1 of dataset A1 which shows P1 (t = 1), and the second 45 variables have a high relationship only with the degree of belongingness U2 of dataset A2 which shows P2 (t = 2). Since U1 and U2 are common scale with the same clusters, this result shows that the scale of the relationship between subjects and variables in (5) can distinguish data A1 and A2 , which differ across subjects. Next, we calculate U˜ t shown in (11) by applying the fuzzy c-Means method. In this case, the number of clusters is assumed to be 3. The parameter m, which adjusts for the fuzziness of belongingness, is assumed to be 1.2. Using this result and the data shown in (10), we calculate the distance shown in (12), and then using the indicator in (13), we create a data matrix consisting of the selected data shown in (14). In this time, the values are assumed sufficiently large as ξ1 = 4.0 × 1028 , ξ2 = 1.0 × 1024 . As a result, 106 data are extracted out of 1250 data. We create a super-matrix shown in (15), which consists of these data matrices. Figure 3 shows the result U t in (16) obtained by applying the fuzzy c-Means method to this super-matrix. Where the number of clusters is assumed to be 3, the parameter m, which adjusts for the fuzziness of belongingness, is assumed to be 1.46. Figure 4 shows the relationship between the variables and the subjects shown in (17), calculated using the results of the degree of belongingness shown in Fig. 3 and only the 106 selected data shown in (14). In this figure, the horizontal axis represents the 45 variables for each of P1 and P2 of the time series data, and the vertical axis represents the value of cl(t) shown in (17) for that variable. The black line graph shows the first subject P1, and the red line graph shows the second subject P2. In Fig. 4, the
K. Nitta and M. Sato-Ilic
0.0
0.2
0.4
0.6
0.8
1.0
422
1
The object of P1
106
The object of P2
212
Fig. 3 Results of the degree of belongingness U t for subjects P1 and P2 Fig. 4 cl(t) for subjects P1 and P2
values of cl(t) show that the first 45 variables have a high relationship only with the degree of belongingness U 1 of dataset A1 which shows P1 (t = 1), and the second 45 variables have a high relationship only with the degree of belongingness U 2 of dataset A2 which shows P2 (t = 2). Since U 1 and U 2 are common scale with the same clusters, this result shows that the scale of the relationship between subjects and variables in (17) can distinguish data A1 and A2 , which differ across subjects. From Figs. 2 and 4, comparing the value of cl(t) using all 1250 data shown in Fig. 2, with the value of cl(t) using only 106 selected data shown in Fig. 4, we see that the trend is generally the same. To quantify this difference, we define the following equation. 2 2p 2 1 (t) cl − cl(t) , l = 1, . . . , 2 p, t = 1, 2. γ = 4 p t=1 l=1
(18)
Figures 5, 6, 7, 8 and 9 show comparisons of the values of cl(t) shown in (5) and cl(t) shown in (17) for the pairs of subjects P1 and P3, P1 and P4, P2 and P3, P2 and P4 and P3 and P4, respectively. (a) shows the values of cl(t) , and (b) shows the values of cl(t) . From these figures, we can see that in all combinations, the subjects are successfully identified by the difference between the red and black lines. Furthermore, it can be seen that (a) and (b) in all combinations have similar trends, indicating that the data reduction in the proposed method is successful.
Individual Difference Assessment Method Based on Cluster Scale …
423
Fig. 5 Comparison of cl(t) and cl(t) for subjects P1 and P3
Table 1 shows the values of ξ1 and ξ2 in (14) used in deriving the values of cl(t) shown in Fig. 4 and Figs. 5–9(b), and the value of the number of selected data, w, is shown. In addition, the ratio of the number of selected data to the original number of data (1250) and the value of γ , which is the difference between the quantified values of cl(t) and cl(t) , shown in (18), are shown. From this table, it can be seen that for all subject combinations, the proposed method successfully extracts the characteristics of the subjects with a significant reduction in the number of data and with the same accuracy as when all data are used.
6 Conclusion We propose individual difference assessment method based on cluster scales using the data reduction method as an indicator for quantitatively detecting differences between subjects for time series data obtained from multiple subjects. This method is based on a metric that shows the relationship between the common cluster scale obtained through subjects and data variables [1, 2]. However, this paper is unique in that it extracts the relationship between the common cluster scale and the data variables after a significant reduction in the number of data by selecting the data that affect the obtained common clusters by defining a new distance between the target data and the common cluster using an idea of a distance based on maximum likelihood estimation with fuzzy covariance [3]. We show that measuring the data between different subjects on only the selected data on a common cluster scale yields comparable results to those obtained when using all data before selection, thus demonstrating the effectiveness of the proposed method.
(t)
(t)
Fig. 6 Comparison of cl and cl for subjects P1 and P4
424 K. Nitta and M. Sato-Ilic
Individual Difference Assessment Method Based on Cluster Scale …
Fig. 7 Comparison of cl(t) and cl(t) for subjects P2 and P3
(t)
(t)
Fig. 8 Comparison of cl and cl for subjects P2 and P4
Fig. 9 Comparison of cl(t) and cl(t) for subjects P3 and P4
425
426
K. Nitta and M. Sato-Ilic
Table 1 Comparison of results between different subjects Suject to compare
P1 and P2
ξ1
4.0 × 1028 9.8 × 1023 3.4 × 1026 2.3 × 1024 1.0 × 1035 2.0 × 1022
ξ2
1.0 × 1024 6.0 × 1017 1.0 × 1018 9.0 × 1020 1.0 × 1024 1.3 × 1017
The original number of data, n
1250
1250
1250
1250
1250
1250
The number of selected data, w
106
97
100
97
159
43
The ratio of the 8.48 number of selected data (%)
7.76
8
7.76
12.72
3.44
γ
0.18
0.18
0.12
0.1
0.26
0.09
P1 and P3
P1 and P4
P2 and P3
P2 and P4
P3 and P4
References 1. Sato-Ilic, M.: Knowledge-based comparable predicted values in regression analysis. Procedia Comput. Sci. 114, 216–223 (2017) 2. Sato-Ilic, M.: Cluster-scaled regression analysis for high-dimension and low-sample size data. Adv. Smart Syst. Res. 7(1), 1–10 (2018) 3. Gath, I., Geva, A.B.: Unsupervised optimal fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. II(7), 773–780 (1989) 4. Bezdek, J. C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press (1981) 5. Altun, K., Barshan, B.: Human activity recognition using inertial/magnetic sensor units. In: HBU 2010. LNCS 6219, pp. 38–51. Springer, Berlin, Heidelberg (2010) 6. UCI Machine Learning Repository: Daily and Sports Activities Data Set. https://archive.ics.uci. edu/ml/datasets/daily+and+sports+activities
Spatial Data Analysis and Sparse Estimation
Coordinate Descent Algorithm for Normal-Likelihood-Based Group Lasso in Multivariate Linear Regression Hirokazu Yanagihara and Ryoya Oda
Abstract We focus on an optimization algorithm for a normal-likelihood-based group Lasso in multivariate linear regression. A negative multivariate normal loglikelihood function with a block-norm penalty is used as the objective function. A solution for the minimization problem of a quadratic form with a norm penalty is given without using the Karush–Kuhn–Tucker condition. In special cases, the minimization problem can be solved without solving simultaneous equations of the first derivatives. We derive update equations of a coordinate descent algorithm for minimizing the objective function. Further, by using the result of the special case, we also derive update equations of an iterative thresholding algorithm for minimizing the objective function. Keywords Adaptive group Lasso · Block-norm regularization · Multivariate linear regression · Negative normal log-likelihood function
1 Introduction The multivariate linear regression model is a fundamental inferential method for predicting mutually correlated response variables from a set of explanatory variables (see e.g., [1]). In multivariate linear regression contexts, we may be interested in choosing the subset of relevant explanatory variables that are active in at least one response variable (see e.g., [2]). Rather than Lasso put forward by [3], group Lasso proposed by [4] is often used for variable selection in multivariate regression; this
H. Yanagihara (B) Mathematics Program, Graduate School of Advanced Science and Engineering, Hiroshima University, Higashi-Hiroshima, Hiroshima 739-8526, Japan e-mail: [email protected] R. Oda School of Informatics and Data Science, Hiroshima University, Higashi-Hiroshima, Hiroshima 739-8526, Japan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_36
429
430
H. Yanagihara and R. Oda
approach selects the subset of relevant explanatory variables via an estimation of regression coefficients based on block-norm regularization. Let Y be an n × p matrix of standardized response variables, i.e., Y 1n = 0 p , and all the diagonal elements of S11 = Y Y are 1, and let X be an n × k matrix of standardized explanatory variables, i.e., X 1n = 0k , and all the diagonal elements of S22 = X X are 1, where n is the sample size, 1n is an n-dimensional vector of ones, 0 p is a p-dimensional vector of zeros, and a prime denotes the transpose of a matrix or vector. To ensure that model estimation is possible, we assume that rank(X) = k (< n) and n > p. In what follows, a k × p matrix of regression coefficients = (θ 1 , . . . , θ k ) and a p × p covariance matrix −1 are estimated by a normal-likelihood-based group Lasso. An objective function of the normallikelihood-based group Lasso is the following negative multivariate normal loglikelihood function with a block-norm penalty: g(, ) = (, |Y , X) + λ
k
ω j θ j ,
(1)
j=1
where λ is a non-negative tuning parameter and (, |Y , X) is a negative multivariate normal log-likelihood function given by (, |Y , X) =
n 1 tr (Y − X) (Y − X) − log ||. 2 2
(2)
Here, ω j is an adaptive Lasso weight proposed by [6]; especially in this paper, ω j = 1/θ¯ j is used, where θ¯ j is the least square estimation (LSE) of θ j given by (θ¯ 1 , . . . , θ¯ k ) = S12 S−1 22 and S12 = Y X. Henceforth, (, ) is simplified notation for (, |Y , X). To minimize g(, ) in (1), we derive a coordinate descent algorithm (CDA) along the blockwise coordinate directions θ 1 , . . . , θ k . This is essentially the same as solving the minimization problem of the following norm penalized quadratic form: f (z|M, b, λ) =
1 z M z − b z + λz, 2
(3)
where b and M are a p-dimensional vector and a p × p positive definite matrix, respectively. Although it is common to use the Karush–Kuhn–Tucker (KKT) condition to solve the minimization problem of (3), in this paper, we derive the minimizer of f (z|M, b, λ) without using the KKT condition. If the minimizer is not 0 p , an iterative calculation is required deriving the minimizer of (3). However, we show that the solution of the minimization problem can be derived explicitly without solving simultaneous equations of the first derivatives of (3) when M is a scalar matrix, i.e., M = a I p , where a > 0. By using this minimizer, we derive update equations of the CDA for minimizing g(, ). [5] also consider the CDA for minimizing g(, ). Unfortunately, their CDA was incomplete because the minimization problem was not completely solved when g(, ) does not become minimum at 0 p along the
Coordinate Descent Algorithm for Normal-Likelihood-Based Group …
431
blockwise coordinate direction θ j ( j = 1, . . . , k). By using the minimizer of the special case, we also derive update equations of an iterative thresholding algorithm (ITA) for minimizing g(, ). The remainder of the paper is organized as follows. In Sect. 2, we solve the minimization problem of f (z|M, b, λ). By using the solution to the minimization problem, we derive an update equation of the CDA for minimizing g(, ) in Sect. 3. In Sect. 4, we present the results of numerical experiments.
2 Minimization of Norm-Penalized Quadratic Form In this section, we solve the minimization problem of f (z|M, b, λ) in (3), i.e., the norm-penalized quadratic form. First, the following lemma is prepared (the proof is given in Appendix 1): Lemma 1 Suppose that M, b and λ are, respectively, the same matrix, vector, and value as in (3). Let c0 be a positive initial value. We define a real sequence {cm }m=0,1,2,... derived by the following recurrence relation: −2 λ cm+1 = b M + √ I p b. cm
(4)
Then, when b is not zero vector, the sequence {cm }m=0,1,2,... converges to a solution of the equation q(x) = 1, where q(x) = b
√
x M + λI p
−2
b.
(5)
By using Lemma 1, we have a minimizer of f (z|M, b, λ) as in the following theorem (the proof is given in Appendix 2): Theorem 1 The z that minimizes the function f (z|M, b, λ) is given by −1 λ b, z = arg minp f (z|M, b, λ) = I (b > λ) M + √ I p z∈R c∞
(6)
where I (A) is an indicator function, i.e., I (A) = 1 if A is true and I (A) = 0 if A is not true, and c∞ is the limit of the sequence {cm }m=0,1,2,... given in (4). If M is a scalar matrix, i.e., M = a I p , where a > 0, q(x) in (5) is given by q(x) = Then, when b > λ, we have
b √ a x +λ
2 .
432
H. Yanagihara and R. Oda
q(x) = 1 ⇐⇒
√
x=
1 (b − λ). a
This implies a simple form of z as in the following corollary: Corollary 1 The z that minimizes the function f (z|a I p , b, λ) is given by z = arg minp f (z|a I p , b, λ) = z∈R
λ 1 1− b, a b +
where (x)+ = max{0, x} and a > 0. We derived Corollary 1 as a special case of Theorem 1. Theorem 1 was proved by solving simultaneous equations of the first derivatives of (3). However, we can derive Corollary 1 without solving simultaneous equations of the first derivatives of f (z|a I p , b, λ) (the proof is given in Appendix 3).
3 CDA for Normal-Likelihood-Based Group Lasso In this section, we derive the CDA for the normal-likelihood-based group Lasso, i.e., the CDA for minimizing g(, ) in (1). Let S() be the matrix function defined by nS() = (Y − X) (Y − X) = S11 − S12 − S12 + S22 . (7) Then, g(, ) ≥ g(, S()−1 ) holds. Hence, it is sufficient to consider the minimization problem of g(, ) along the blockwise coordinate direction θ j ( j = 1, . . . , k). Let Q be a p × p orthogonal matrix that diagonalizes as = diag(φ1 , . . . , φ p ), i.e., = Q Q , and be a k × p matrix defined by = (ξ 1 , . . . , ξ k ) = Q. Notice that Q Q = Q Q = I p , θ j = Qξ j and Qξ j = ξ j . It follows from the equations that g(, ) is rewritten as g0 (, ) given by g0 (, ) = (, |Y Q, X) + λ
k
ω j ξ j ,
j=1
where the function (, |Y , X) is given by (2). From the above modification, we ˆ Q , where ˆ is the minimizer know that minimizing g(, ) can be obtained by of g0 (, ). Let R be an n × p matrix of regression residuals, i.e., R = Y − X, and let W = (w1 , . . . , w k ) be a k × k matrix defined by W = S22 − I k of which the (i, j)th element denotes wi j . Furthermore, let a j be the jth column vector of X, i.e., X = (a1 , . . . , ak ), and let Rα be an n × p matrix of regression residuals with the αth explanatory variable removed, i.e., Rα = Y − kj =α a j θ j = R + aα θ α
Coordinate Descent Algorithm for Normal-Likelihood-Based Group …
433
(α = 1, . . . , k). Let bα = Q Rα aα . Recall that wαα = 0 and aj aα = w jα ( j = α). Hence, we have ⎛ ⎞ ⎛ ⎞ k k bα = Q ⎝Y aα − w jα θ j ⎠ = Q ⎝Y aα − w jα θ j ⎠ . j =α
j=1
Notice that ( Q Y a1 , . . . , Q Y ak ) = Q Y X = Q S12 and ⎛ ⎝
k
w j1 θ j , . . . ,
j=1
k
⎞ w jk θ j ⎠ = W = Q W .
j=1
This implies that B = (b1 , . . . , bk ) = ( Q S12 − W ). Notice also that Y Q − X = Rα Q − aα ξ α and aα = 1. Hence, for a certain α, g0 (, ) can be expressed as g0 (, ) = (ξ α , |Rα Q, aα ) + λ
k
ω j ξ j
j=1 k 1 n = f (ξ α |, bα , λωα ) + tr(Rα Rα ) − log || + λ ω j ξ j , 2 2 j =α
where the function f (z|M, b, λ) is given by (3). It follows from the above equation and Theorem 1 that the CDA for the normal-likelihood-based group Lasso is as in Algorithm 1. Algorithm 1: CDA for Normal-Likelihood-Based Group Lasso ˆ (0) be an initial regression coefficient matrix, c1,0 , . . . , ck,0 be positive Step 1: Let ˆ (0) )−1 , ˆ (1) = S( initial values, and ε1 > 0 and ε2 > 0 be small values. Set where S( A) is the matrix function given by (7). ˆ (m−1) = (θˆ (m−1) , . . . , θˆ (m−1) ) be a regression coefficient matrix Step 2: Let 1
k
ˆ derived by the (m − 1)th iteration, and
(m)
(m−1) −1
ˆ = S(
) .
ˆ (m) as • Let Q be a p × p orthogonal matrix that diagonalizes (m) ˆ ˆ (m−1) Q, = Q Q . Set = (ξ 1 , . . . , ξ k ) = = diag(φ1 , . . . , φ p ), i.e., U = (u1 , . . . , uk ) = Q S12 − W and B = (b1 , . . . , bk ) = U. • For each block j = 1, . . . , k, calculate ⎛
⎞−1
λω j ξ j = I (bj > λω j ) ⎝ I p + −1 ⎠ cj
uj .
434
H. Yanagihara and R. Oda
Here, cj is a value obtained by repeating the following recurrence equation until |c j,α − c j,α−1 | < ε1 is satisfied: c j,α = uj
λω j Ip + √ −1 c j,α−1
−2
uj .
Update U by U + (ξ j − ξ j )wj and B by B + (ξ j − ξ j )wj . If ξ j = 0, update c j,0 by cj . After updating U, B, and c j,0 , update ξ j by ξ j . ˆ (m) = Q . • After repeating j from 1 to k, set ˆ (m−1) ) < ε2 is satisfied. Here, δ( A1 , A2 ) ˆ (m) , Step 3: Repeat Step 2 until δ( denotes a distance between two matrices A1 and A2 defined by δ( A1 , A2 ) = vec( A1 − A2 )/vec( A2 ). Additionally, by using the result in Corollary 1, we derive the ITA for the normallikelihood-based group Lasso. Let D() be a p × k matrix defined by D() = (d 1 (), . . . , d k ()) =
∂ (, ) = −(S12 − S22 ). ∂
Let 0 = (θ 0,1 , . . . , θ 0,k ) be a k × p matrix and φmax ( A) be the maximum eigenvalue of a matrix A. Notice that g(, ) ≤ (0 , ) +
k
d j (0 ) (θ j − θ 0, j ) +
j=1
= (0 , ) + l0 + L
k
L θ j − θ 0, j 2 + λω j θ j 2
f θ j I p , θ 0, j − L −1 d j (0 ), L −1 λω j ,
j=1
where L = φmax ()φmax (S22 ) and l0 = Ltr(0 0 )/2 − tr(0 D(0 )). It follows from the above equation and Corollary 1 that the ITA for the normal-likelihoodbased group Lasso is as in Algorithm 2. Algorithm 2: ITA for Normal-Likelihood-Based Group Lasso ˆ (0) be an initial regression coefficient matrix, and ε > 0 be a small Step 1: Let ˆ (0) )−1 , where S( A) is the matrix function given by ˆ (1) = S( value. Set (7). ˆ (m−1) be a regression coefficient matrix derived by the (m − 1)th iterStep 2: Let ˆ (m−1) )−1 , and B = (b1 , . . . , bk ) = ˆ (m−1) + (S12 − ˆ (m) = S( ation, ˆ (m−1) ) ˆ (m) /L, where L = φmax ( ˆ (m) )φmax (S22 ). Set S22 (m) λω j bj . θˆ j = 1 − Lbj +
Coordinate Descent Algorithm for Normal-Likelihood-Based Group …
435
ˆ (m) , ˆ (m−1) ) < ε is satisfied. Step 3: Repeat Step 2 until δ(
4 Numerical Results In this section, we present the results of numerical experiments that were conducted to compare the following two properties of the CDA and ITA: Property 1. The computation time required for the optimization algorithm. Property 2. The minimum value of g(, ) in (1) achieved by the optimization algorithm. Although the performance of variable selection is an important property in the group Lasso, we refrain from studying this property when comparing between the CDA and ITA because it strongly depends on which information criterion is used to select the tuning parameter. Let C p (ρ) be a p × p autocorrelation matrix of which the (a, b)th element is ρ |a−b| . Simulation data Y ∗ were generated from the multivariate normal distribution Nn, p (X ∗ ∗ , C p (0.8) ⊗ I n ), where X ∗ = T ∗ C k (0.8)1/2 . Here, elements of T ∗ were independently generated from the uniform distribution U (−1, 1). Moreover, elements of the first to k∗ th rows of ∗ were independently generated from the uniform distribution U ((−6, −1) ∪ (1, 6)), and elements of the (k∗ + 1)th to kth rows of ∗ were 0. The Y and X are matrices that standardize Y ∗ and X ∗ , respectively. The following λ1 , . . . , λ100 were prepared as tuning parameters. In this paper, we did not select the optimal tuning parameters specifically. λi = λmax (3/4)100−i , λmax = max
j=1,...,k
n S−1 11 Y a j . ωj
(i,l) The simulation data were generated 1000 times for each (n, p, k, k∗ ). Let gCDA and (i,l) gITA be the minimum values of g(, ) achieved by the CDA and ITA at λ = λi in (i,l) (i,l) the lth iteration, respectively, and tCDA and tITA be the computation times required for convergence when using the CDA and ITA at λ = λi in the lth iteration, respectively (l = 1, . . . , 1000). To study properties 1 and 2, we calculated the following values:
1 (i,l) (i,l) P1 = I (gCDA < gITA ), 1000 l=1 i=1 1000 100
P2 =
1000 100 (i,l) 1000 100 1 (i,l) (i,l) l=1 i=1 tITA I (gCDA > gITA ), T = 1000 100 (i,l) . 1000 l=1 i=1 l=1 i=1 tCDA
Table 1 shows P1 , P2 , and T for each (n, p, k, k∗ ). Therein, we can see that the minimum value of g(, ) achieved by the CDA tended to make g(, ) smaller than that achieved by the ITA, because P1 was much higher than P3 in many cases.
436
H. Yanagihara and R. Oda
Table 1 Results of numerical experiments n k k∗ p=4 P1 P2 T 10
200
50
100
10
500
50
100
1 2 5 5 10 25 10 20 50 1 2 5 5 10 25 10 20 50
83.17 83.31 83.16 85.06 83.54 81.33 83.07 82.14 79.51 82.85 84.92 84.25 86.51 84.98 82.55 85.25 83.45 81.32
12.45 14.46 15.53 14.55 15.77 18.53 16.73 17.79 20.36 10.31 11.24 13.27 12.92 14.72 17.23 14.39 16.33 18.55
7.22 5.25 5.90 2.76 2.98 2.22 1.28 1.10 1.08 7.63 4.32 6.31 3.32 2.62 2.18 1.79 2.21 1.73
P1
p = 10 P2 T
P1
p = 20 P2 T
80.70 81.81 82.69 84.49 83.06 80.51 83.52 81.51 79.12 81.13 82.06 83.52 85.20 85.47 82.69 85.27 83.48 80.73
13.97 14.59 16.29 14.83 16.54 19.26 16.22 18.32 20.80 13.52 12.97 14.47 13.57 14.16 17.13 14.55 16.27 19.12
77.80 80.03 80.41 82.34 82.40 81.18 82.08 81.47 79.20 78.08 80.99 81.54 83.20 83.47 82.33 84.02 83.75 80.80
14.60 15.76 17.24 16.84 17.08 18.51 17.61 18.35 20.70 13.38 14.53 16.51 15.86 15.95 17.38 15.69 16.06 19.05
7.95 7.27 4.26 1.49 1.49 2.49 0.90 0.92 0.89 8.57 6.84 3.99 3.19 2.78 1.96 1.18 1.40 2.19
7.91 6.80 7.51 2.86 1.85 1.77 0.82 0.72 0.77 8.80 6.18 4.53 3.74 2.35 2.09 1.50 1.31 1.65
Furthermore, T was larger than 1 in most cases. This means that the CDA converged faster than the ITA in the context set by this numerical study. Appendix 1: Proof of Lemma 1 At first, to prove c∞ exists, we show that the sequence {cm }m=0,1,2,··· is strictly monotonically decreasing and bounded below or strictly monotonically increasing and √ bounded above. Since M is a positive definite matrix, M + (λ/ cm )I p is also a positive definite matrix. Hence, we have cm > 0 because b is not a zero vector. Let Q be a p × p orthogonal matrix that diagonalizes M as = diag(φ1 , . . . , φ p ). Let v = (v1 , . . . , v p ) = Q b. By using v and , we have −2 p v j2 λ cm+1 = v + √ I p v= = γ (cm ). √ cm (φ j + λ/ cm )2 j=1
(8)
It is straightforward to discern that γ (c) is a strictly monotonically increasing function with respect to c. Hence, we have γ (c) ≤ limc→∞ γ (c) = b M −2 b. These results imply that 0 < cm ≤ max{c0 , b M −2 b}. From the strictly monotonically increasing γ (c), we have
Coordinate Descent Algorithm for Normal-Likelihood-Based Group …
437
c1 < c0 =⇒ c2 = γ (c1 ) < γ (c0 ) = c1 , c1 > c0 =⇒ c2 = γ (c1 ) > γ (c0 ) = c1 , c1 = c0 =⇒ c2 = γ (c1 ) = γ (c0 ) = c1 . It follows from the above equations that cm < cm−1 =⇒ cm+1 = γ (cm ) < γ (cm−1 ) = cm , cm > cm−1 =⇒ cm+1 = γ (cm ) > γ (cm−1 ) = cm , cm = cm−1 =⇒ cm+1 = γ (cm ) = γ (cm−1 ) = cm . Hence, by mathematical induction, it follows that {cm }m=0,1,2,··· is a strictly monotonically decreasing sequence bounded below when c1 < c0 , or a strictly monotonically increasing sequence bounded above when c1 > c0 , or an identical sequence when c1 = c0 . These facts imply that c∞ exists. Next we show that c∞ is a solution to the equation q(x) = 1, where q(x) is given by (5). It follows from (4) and the existence of c∞ that c∞
−2 λ = b M + √ Ip b c∞ √ −2 ⇐⇒ c∞ = c∞ · b c∞ M + λI p b ⇐⇒ q(c∞ ) = 1.
Hence, we can see that c∞ is the solution to the equation q(x) = 1. Appendix 2: Proof of Theorem 1 At first, we show that b ≤ λ is a necessary and sufficient condition for z = 0 p . Suppose that b ≤ λ. From the Cauchy-Schwarz inequality, we can see that f (z|M, b, λ) ≥
1 1 z M z − b · z + λz = z M z − z(b − λ). 2 2
Since b − λ ≤ 0 holds, we have 1 1 z M z − z(b − λ) ≥ z M z ≥ 0, 2 2 with equality of the last inequality in the above equation if and only if z = 0 p because M is a positive definite matrix. This implies that b ≤ λ =⇒ z = 0 p . Suppose that b > λ. Then, it is straightforward to discern that b is not a zero vector and b M b is not 0. Let h 0 be a positive constant satisfying h 0 = (b − λ)b/b M b. It follows from h 0 > 0 and b − λ > 0 that
438
H. Yanagihara and R. Oda
1 2 h b M b − h 0 b2 + λh 0 b 2 0 (b − λ)b 1 2 M b − 2b + 2λb = h0 · b 2 b M b 1 1 = h 0 (b − λ)b − 2b2 + 2λb = − h 0 b(b − λ) < 0. 2 2
f (h 0 b|M, b, λ) =
Therefore, we have min f (z|M, b, λ) ≤ f (h 0 b|M, b, λ) < f (0 p |M, b, λ) = 0.
z∈R p
This indicates that b > λ =⇒ z = 0 p . It is straightforward to discern that the contrapositive is z = 0 p =⇒ b ≤ λ. Hence, we have b ≤ λ ⇐⇒ z = 0 p . Next, we derive an explicit form of z when z = 0 p , i.e., b > λ. Notice that f (z|M, b, λ) is partially differentiable at z ∈ R p \{0 p }, and its partial derivative is ∂ z f (z|M, b, λ) = M z − b + λ . ∂z z Hence, when z = 0 p , we have
−1 λ λ b I p z = b ⇐⇒ z = M + Ip z z −2 λ 2 b ⇐⇒ q(z2 ) = 1, =⇒ z = b M + Ip z
∂ f (z|M, b, λ) = 0 p ⇐⇒ ∂z
M+
where q(x) is given by (5). Consequently, Theorem 1 is proved. Appendix 3: Another Proof of Corollary 1 We prove Corollary 1 without solving simultaneous equations of the first derivatives of f (z|M, b, λ) in (3). Notice that f (z|a I p , b, λ) = az2 /2 − b z + λz. Using the Cauchy–Schwartz inequality, as in the proof of Theorem 1, we have 1 az2 − b · z + λz 2 2 1 1 1 = a z − (b − λ) − (b − λ)2 , 2 a 2a
f (z|a I p , b, λ) ≥
with equality if and only if z = b holds, where ∈ R. Hence, we derive z minimizing f (z|a I p , b, λ) as b 1 λ 1 1− = b. z = (b − λ)I (b > λ) a b a b +
Coordinate Descent Algorithm for Normal-Likelihood-Based Group …
439
Consequently, Corollary 1 is proved without solving simultaneous equations of the first derivatives of f (z|a I p , b, λ). Acknowledgements The authors wish to thank two reviewers for their helpful comments. This work was financially supported by JSPS KAKENHI (grant numbers JP16H03606, JP18K03415, and JP20H04151 to Hirokazu Yanagihara; JP19K21672, JP20K14363, and JP20H04151 to Ryoya Oda).
References 1. Fox, J., Weisberg, S.: Multivariate linear models in R. In: An Appendix to An R Companion to Applied Regression, 3rd. edn (2019). https://socialsciences.mcmaster.ca/jfox/Books/ Companion/appendices/Appendix-Multivariate-Linear-Models.pdf 2. Obozinski, G., Wainwright, M. J., Jordan, M. I.: High-dimensional support union recovery in multivariate regression. In: Koller, D. Schuurmans, D., Bengio, Y., Bottou, L. (eds) Advances in Neural Information Processing Systems, vol. 21, pp. 1217–1224. Curran Associates, Inc. (2008).http://papers.nips.cc/paper/3432-high-dimensional-support-unionrecovery-in-multivariate-regression.pdf 3. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58, 267–288 (1996). https://doi.org/10.1111/j.2517-6161.1996.tb02080.x 4. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Roy. Statist Soc. Ser. B 68, 49–67 (2006). https://doi.org/10.1111/j.1467-9868.2005.00532.x 5. Wilms, I., Croux, C.: An algorithm for the multivariate group lasso with covariance estimation. J. Appl. Stat. 45, 668–681 (2018). https://doi.org/10.1080/02664763.2017.1289503 6. Zou, H.: The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101, 1418–1429 (2006). https://doi.org/10.1198/016214506000000735
Discriminant Analysis via Smoothly Varying Regularization Hisao Yoshida, Shuichi Kawano, and Yoshiyuki Ninomiya
Abstract The discriminant method, which uses a basis expansion in the logistic regression model and estimates it by a simply regularized likelihood, is considerably efficient especially when the discrimination boundary is complex. However, when the complexities of the boundary are different by region, the method tends to cause under-fitting or/and over-fitting at some regions. To overcome this difficulty, a smoothly varying regularization is proposed in the framework of the logistic regression. Through simulation studies based on synthetic data, the superiority of the proposed method to some existing methods is checked. Keywords Basis expansion · Boundary smoothness · Logistic regression · Over-fitting · Regularization · Under-fitting
1 Introduction The nonlinear discriminant method with a basis expansion has been widely used in statistical and engineering fields. Using the natural cubic splines or radial basis functions as the basis functions in the framework of the logistic regression is standard (see, e.g., Chap. 6 in Bishop [1] or Chap. 5 in Hastie et al. [3]). Usually, in the basis expansion, under-fitting is prevented by placing a lot of basis functions, and overfitting is avoided by using regularization method for estimating their coefficients. As
H. Yoshida Graduate School of Mathematics, Kyushu University, 744 Moto-oka, Nishi-ku, Fukuoka, Japan S. Kawano Graduate School of Informatics and Engineering, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-shi, Tokyo, Japan Y. Ninomiya (B) Department of Statistical Inference and Mathematics, The Institute of Statistical Mathematics, 10-3 Midori-cho, Tachikawa-shi, Tokyo, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_37
441
442
H. Yoshida et al.
Fig. 1 Estimated discrimination boundaries by a conventional basis expansion with a weak regularization (left) and a strong regularization (right). The weak and strong regularizations cause over-fitting and under-fitting, respectively
the regularization, the ridge method (Hoerl and Kennard [4]) or the lasso method (Tibshirani [7]) is often used. While such a basis expansion with the simple regularization works well even if the discrimination boundary is complex, it is often inappropriate when the boundary has inhomogeneous smoothness. That is, when we want to avoid under-fitting at some regions and over-fitting at other regions, we cannot obtain a good boundary by the simple regularization in general. See Fig. 1 as an illustrative example. In this paper, we apply the smoothly varying regularization proposed in Kim et al. [5] to the discriminant analysis in order to overcome this problem. The rest of this article is organized as follows. In Sect. 2, we introduce a framework of the logistic regression based on radial basis functions and estimation methods using a simple regularization. Section 3 studies a new discriminant method by proposing a regularization with adaptive-type penalties. Section 4 investigates the performance of our method by numerical experiments, while some concluding remarks are described in Sect. 5.
2 Basis Expansion For an explanatory variable x ∈ R p and a response variable y ∈ {0, 1}, we have n . Using basis functions φ j (x) ( j = independent n sets of training data {(x i , yi )}i=1 1, 2, . . . , m), we consider the following logistic regression model: p(yi = 1 | x i ) = w0 + w j φ j (x i ); i = 1, 2, . . . , n, p(yi = 0 | x i ) j=1 m
log
Discriminant Analysis via Smoothly Varying Regularization
443
where w j ( j = 0, 1, . . . , m) are unknown coefficient parameters. Letting w = (w0 , w1 , . . . , wm )T and φ(x i ) = (1, φ1 (x i ), . . . , φm (x i ))T , we can express this model as exp(w T φ(x i )) ≡ πi . 1 + exp(w T φ(x i ))
p(yi = 1 | x i ) =
(1)
Like this, a modeling based on basis functions, which is nonlinear with respect to each x i , is called a basis expansion. Here, we use the radial basis functions φ j (x) = exp
||x − μ j ||2
−
2σ j2
;
j = 1, 2, . . . , m
(2)
as the basis functions. Note that μ j and σ j2 indicate the center and width of the basis function, respectively. As a standard way, we set μ j ’s on lattice positions in the space of explanatory variables and set all σ j2 ’s as a constant σ 2 . The log-likelihood function for the logistic regression model is given by l(w) =
n
{yi w T φ(x i ) − log(1 + exp(w T φ(x i )))}
(3)
i=1
from (1). The maximum likelihood estimator minimizing (3) often causes over-fitting especially when the number of basis functions m is large. Regularization methods avoid over-fitting by adding a penalty term to (3). The major ones are the ridge with an 2 penalty term, m ˆ R = argmax l(w) − λ w 2j , w w
(4)
j=1
and the lasso with an 1 penalty term, m ˆ = argmax l(w) − λ |w j | , w L
w
(5)
j=1
where λ is a tuning parameter. These methods shrink estimators of the coefficient parameters toward 0. In particular, the lasso estimates some of them strictly by 0, that is, gives a sparse solution. While it tends to shrink the estimators too much, the adaptive lasso (Zou [11]) has a preferable property of the oracle one. It has a two-stage estimation procedure: in the first stage, the maximum likelihood estimator ˆ ML = argmaxw l(w) is calculated as an initial one; in the second stage, adding a w penalty term using the initial estimator to the log-likelihood function, it provides
444
H. Yoshida et al.
ˆ w
AL
m |w j | , = argmax l(w) − λ wˆ ML w j j=1
(6)
where λ is also a tuning parameter. As estimators with the oracle property, the SCAD (Fan and Li [2]) and MC+ (Zhang [10]) are also well-known. In the methods raised above, the estimator of πi , denoted by πˆ i , is obtained from substituting the estimator of w into (1). Note that all the vector components are shrunk by using the same tuning parameter. For the cases where they should be shrunk by tuning parameters different by regions, such methods cannot provide appropriate discriminant rules.
3 Smoothly Varying Regularization 3.1 Motivation We can say the followings for our discriminant model made by (1) and (2): • There exists D †j (⊂ R p ), which is a region around μ j , such that wˆ j is almost determined by {(yi , x i ) | x i ∈ D †j }, that is, wˆ j does little depend on {(yi , x i ) | / D †j }. xi ∈ • There exists Di‡ (⊂ R p ), which is a region around x i , such that πˆ i is almost determined by {wˆ j φ j (x i ) | μ j ∈ Di‡ }, that is, πˆ i does little depend on {wˆ j φ j (x i ) | μj ∈ / Di‡ }. These hold because φ j (x i ) ≈ 0 if μ j and x i have a distance to some extent. Combining these indicates the following: • There exists Di (⊂ R p ), which is a region around x i , such that πˆ i is almost deter/ mined by {(y j , x j ) | x j ∈ Di }, that is, πˆ i does little depend on {(y j , x j ) | x j ∈ Di }. Let us consider some situations to reveal a weak-point of the regularization using the same tuning parameter. First, we assume that around x i , almost all x j ’s have the labels y j ’s which are different from yi , as in the left panel of Fig. 2. In this situation, it is often the case that the label yi was generated in error and that we want to make a discriminant rule giving the other label for x i , that is, want to avoid over-fitting around x i . As written above, yˆi depends mainly on {(y j , x j ) | x j ∈ Di }. This neighbor Di becomes larger when wˆ j for μ j around x i are shrunk more strongly because the strong regularization makes the differences among the coefficient estimators small. Namely, as they are regularized strongly, yˆi becomes to rely on its neighbors with the labels different from yi , and we can avoid over-fitting around x i . Next, we assume that around x i , a small group of x j ’s have the labels y j ’s which are the same as yi while almost all samples outside the group have the other label,
Discriminant Analysis via Smoothly Varying Regularization
445
Fig. 2 Data are drawn by red circles and black crosses, and centers of basis functions are drawn by blue dots. The left and right raise examples in which over-fitting and under-fitting should be avoided, respectively. If a strong or weak regularization is used, the region of neighbors becomes small or large, respectively, as indicated by the dashed circular regions
as in the right panel of Fig. 2. In this situation, it is often the case that the group can be regarded to have the correct label and that we want to make a discriminant rule giving the label yi for x i , that is, want to avoid under-fitting around x i . When wˆ j for μ j around x i are shrunk weakly, yˆi becomes to depend mainly on itself, and we can avoid under-fitting around x i . In summary, it is desirable to strongly or weakly regularize wˆ j if μ j is at the region where over-fitting or under-fitting should be prevented, respectively. Because the existing regularization methods shrink all the wˆ j ’s with the same degree of strength, they cannot provide an appropriate discriminant rule for data with these two types of regions.
3.2 Smoothly Varying Regularization For a generalization of (4), we apply the regularization in Kim et al. [5] to the discriminant analysis in the previous section. Specifically, we propose the following penalized log-likelihood function: pl(w, λ) = l(w) −
m m m γ1 1 λ j w 2j − (λ j − λk )2 + γ2 log λ j . 2 j=1 2 j=1 k∈N j=1
(7)
j
Here, λ = (λ1 , . . . , λm )T is a positive tuning parameter vector, N j is a neighbor of j, and γ1 and γ2 are so-called positive hyper-tuning parameters. In this paper, we set N j as the closest four indices excluding j, that is, μ j − μk is the smallest excluding 0 if k ∈ N j .
446
H. Yoshida et al.
An estimator of w and an appropriate value of λ can be obtained by minimizing (7) with respect to w and λ as follows: ˆ = argmax pl(w, λ). ˆ λ) (w,
(8)
(w,λ)
In our function in (7), the first penalty is to assign different tuning parameters to different coefficients, which promotes adaptability of tuning parameters. If these tuning parameters can vary freely, clearly over-fitting occurs. The second penalty encourages the continuity of the tuning parameters in order to avoid over-fitting. This idea is similar to those of the fused lasso in Tibshirani et al. [8] and its generalized version in Tibshirani and Taylor [9]. By adopting this, we can realize a smoothly varying regularization. The third penalty keeps the values of tuning parameters from being shrunk to zero. The hyper-tuning parameters γ1 and γ2 in (7) controls the magnitude of effects for the second and third penalties. At the smaller value of γ1 , the tuning parameters tend to vary freely, and then they tend to be continuous at larger value of γ1 . At the larger value of γ2 , the tuning parameters tend to take a value which is far from zero. These values are determined by the cross-validation (Stone [6]).
3.3 Concavity of Regularized Log-Likelihood In an optimization problem, the concavity of the object function is an important property. Here, we show the concavities of pl(w, λ) in (7) with respect to w and λ. From (3) and (7), we have n ∂ 2 pl(w, λ) = − πi (1 − πi )φ(x i )φ(x i )T − , ∂w∂w T i=1
where is the diagonal matrix whose components are obtain ⎧ −2γ1 |N j | − γ2 /λ2j ∂ 2 pl(w, λ) ⎨ H jk ≡ = 2γ1 ⎩ ∂λ j ∂λk 0
(9)
those of λ. In addition, we ( j = k) (k ∈ N j ) . (otherwise)
The quadratic form of H = (H jk )1≤ j,k≤m can be written as
Discriminant Analysis via Smoothly Varying Regularization
uT H u =
m j=1
=−
H j j u 2j +
m
447
H jk u j u k
j=1 k∈N j
m m γ2 2 u + (−γ1 u 2j + 2γ1 u j u k − γ1 u 2k ) < 0. 2 j λ j j=1 j=1 k∈N j
Hence, Hessian matrices of pl(w, λ) in (7) with respect to w and λ are shown to be negative definite.
3.4 Estimation Procedure Since the penalized log-likelihood function pl(w, λ) is concave with respect to w or λ when the other parameter vector is fixed, we can obtain a local maximum point efficiently. First, let us consider n the maximization with respect to w for fixed λ. From (8) and (yi − πi )φ(x i ) − w, we use ∂ pl(w, λ)/∂w = i=1 w+
n
−1
n πi (1 − πi )φ(x i )φ(x i ) + (yi − πi )φ(x i ) − w (10) T
i=1
i=1
as an updated value of w considering Newton-Raphson method. Next, let us consider the maximization with respect to λ for fixed w. Since we obtain
1 ∂ pl(w, λ) 2 2 4γ1 |N j |λ j + w j − 4γ1 =− λk λ j − 2γ2 , ∂λ j 2λ j k∈N j
if the tuning parameters except for λ j are fixed, we can explicitly maximize pl(w, λ) by
λj =
−q j +
q 2j + 32γ1 γ2 |N j | 8γ1 |N j |
2 q j = w j − 4γ1 λk . k∈N j
Thus, we propose the following procedure: Parameter estimation via smoothly varying regularization 1. 2. 3. 4. 5. 6.
Let w(new) and λ(new) be their initial values. Substitute w(new) and λ(new) into w(old) and λ(old) , respectively. Calculate πi (i = 1, 2, . . . , n) by (1) using w(old) . Update w by (10) using w (old) and λ(old) , and denote it by w(new) . Update λ by (11) using w(new) and λ(old) , and denote it by λ(new) . Go to 2 until an appropriate convergence condition satisfies.
(11)
448
H. Yoshida et al.
4 Numerical Experiments 4.1 Settings Letting the dimension of explanatory variables x be 2, we explain our settings of numerical experiments. First, for the basis functions, we set their centers μ j ( j = 1, 2, . . . , m) on lattice points with the distance d. Their variances σ j2 are all set to d as in Fig. 3. In addition, we apply the threefold cross-validation in selecting the hyper-tuning parameters. As the loss function in the cross-validation, a misclassification rate is used. Next, we state how to set the initial values of the parameters. For w, we just use 0 while the maximum likelihood estimator may be appropriate if its dimension m is not large comparing to the data size n. To make up for this simple implementation, for λ, we will provide the initial value more carefully. They are preferable to be large at the region where over-fitting should be avoided, small at the region where under-fitting should be avoided and given in an easy way. Specifically, we propose the following method: Setting the initial value of λ j 1. Letting D j be an appropriately defined neighbor of μ j , calculate as follows:
0.0
0.0
0.2
0.2
0.4
0.4
x2
Phi
0.6
0.6
0.8
0.8
1.0
1.0
n 1 j = |{x i ∈ D j , yi = 1 | i = 1, 2, . . . , n}| n 0 j = |{x i ∈ D j , yi = 0 | i = 1, 2, . . . , n}|
0.0
0.2
0.4
0.6 x1
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
x1
Fig. 3 In the left panel, data with y = 1 and y = 0 are drawn by red circles and black crosses, respectively, and centers of basis functions are drawn by blue dots. In the right panel, equally placed basis functions are drawn on and centers of basis functions are drawn on the cross-sectional view
Discriminant Analysis via Smoothly Varying Regularization
449
2. Letting c be an appropriately defined positive constant, define the initial value of λ j as ⎧ c(1 − n 0 j /n 1 j ) ⎪ ⎪ ⎨ c(1 − n 1 j /n 0 j ) c ⎪ ⎪ ⎩ −5 10
if if if if
n1 j n1 j n1 j n1 j
> n0 j < n0 j . = n0 j = 0 = n 0 j = 0
In this paper, as the value of c, we use twice the value of the tuning parameter in the naive ridge method. By using this procedure, the initial value of λ j becomes close to c if the neighboring region around μ j should avoid over-fitting, that is, for example, the region has just one point with a label different from those of the others, the region has just one label, or the region has no data. On the other hand, the initial value of λ j becomes close to 0 if the neighboring region around μ j should avoid under-fitting, that is, the difference between the data sizes with y = 1 and y = 0 is small.
4.2 Synthetic Data Analysis The following discriminant methods are compared: • • • • • •
K-nearest-neighbor method (K-NN) Soft margin support vector machine (C-SVM) Logistic regression with the ridge regularization in (4) (Ridge-LR) Logistic regression with the lasso regularization in (5) (Lasso-LR) Logistic regression with the adaptive lasso regularization in (6) (A-Lasso-LR) Logistic regression with the smoothly varying regularization in (8) (Varying-LR)
We treat the training data shown in Fig. 4, which are generated according to the following procedure: Data generation 1. Let z i = (z i1 , z i2 ) (i = 1, 2, . . . , 490) be uniformly distributed random variables on [−π, π ] × [−1.5, 4]. 2. Label z i as yi = 1 if sin 8z i1 ≤ z i2 ≤ 2.5 and yi = 0 otherwise. 3. Let z i = (z i1 , z i2 ) (i = 491, 492, . . . , 495) be uniformly distributed random variables on [−π, π ] × [2, 2.5], and label them as yi = 0. 4. Let z i = (z i1 , z i2 ) (i = 496, 497, . . . , 500) be uniformly distributed random variables on [−π, π ] × [2.5, 3], and label them as yi = 1. 5. Standardize z i and denote it by x i . In these data, over-fitting and under-fitting should be prevented at the upper and lower regions, respectively. In Fig. 4, the real lines are the standardized version of z i2 = sin 8z i1 and z i2 = 2.5, and they are the boundaries we want to obtain from our discriminant rule. We set 20 × 20 as the number of the lattice points and space out
1.0 0.5 −1.5
−1.0
−0.5
0.0
Fig. 4 Data are drawn by red dots and black crosses, and the true discrimination boundary is drawn by blue lines
H. Yoshida et al.
1.5
450
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
them evenly. As the neighboring region D j of μ j , which is necessary in setting the initial value of λ, we use {x : x − μ j 2 ≤ 4d 2 }. The followings are candidates of values of the tuning parameters in each method, and we select the optimal one by the threefold cross-validation. • The size of the nearest-neighbors in K-NN: 1, 3, 5, 7, 9, 11 • The cost parameter in C-SVM: 1, 2, . . . , 10 • The tuning parameter λ in Ridge-LR: 1.0 × 10−a , 2.5 × 10−a , 5.0 × 10−a , 7.5 × 10−a (a = 1, 2, 3) • The tuning parameter λ in Lasso-LR: 1.0 × 10−a , 2.5 × 10−a , 5.0 × 10−a , 7.5 × 10−a (a = 1, 2, 3) • The tuning parameter λ in A-Lasso-LR: 1.0 × 10−a , 2.5 × 10−a , 5.0 × 10−a , 7.5 × 10−a (a = 1, 2, 3) • The hyper-tuning parameters γ1 and γ2 in Varying-LR: 1.0 × 10−a , 5.0 × 10−a (a = 1, 2) and 1.0 × 10−b , 5.0 × 10−b (b = 3, 4) Let us check the discriminant boundaries by these methods, which are depicted in Fig. 5. We can say the followings: K-NN causes under-fitting in the lower region but fits well in the upper region; C-SVM clearly causes over-fitting everywhere; RidgeLR causes under-fitting and over-fitting in the lower and upper regions, respectively; Lasso-LR is not much different from Ridge-LR; A-Lasso-LR causes over-fitting in the upper region but fits well in the lower region; Varying-LR fits well in the upper region similarly to K-NN and fits better in the lower region than A-Lasso-LR. Figure 6 shows the variation of λ in Varying-LR. From the left panel, it can be seen that the initial values of λ j ’s are small around the true boundary. Around the lower true boundary, a larger region should have small initial values, that is, should avoid under-fitting. Actually, we can see from the right panel that the region with small values of λ j ’s becomes large after repeating updates. Next, we compare the methods through their misclassification rates evaluated by Monte-Carlo simulations. The simulations are conducted using the same distribution and settings as in Fig. 4.
451
−1.5
−1.5
−1.0
−1.0
−0.5
−0.5
0.0
0.0
0.5
0.5
1.0
1.0
1.5
1.5
Discriminant Analysis via Smoothly Varying Regularization
−1.5
−1.0
−0.5
0.0
0.5
1.0
−1.5
1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
1.0
1.5
1.0
1.5
(b) C-SVM
−1.5
−1.5
−1.0
−1.0
−0.5
−0.5
0.0
0.0
0.5
0.5
1.0
1.0
1.5
1.5
(a) K-NN
−1.5
−1.0
−0.5
0.0
0.5
1.0
−1.5
1.5
−1.0
−0.5
0.0
0.5
(d) Lasso-LR
−1.5
−1.5
−1.0
−1.0
−0.5
−0.5
0.0
0.0
0.5
0.5
1.0
1.0
1.5
1.5
(c) Ridge-LR
−1.5
−1.0
−0.5
0.0
0.5
(e) A-Lasso-LR
1.0
1.5
−1.5
−1.0
−0.5
0.0
0.5
(f) Varying-LR
Fig. 5 Discriminant boundary estimated by each method in the setting of Fig. 4
452
H. Yoshida et al.
Fig. 6 In the left and right panels, the initial and updated values of λ j ’s provided by the proposed procedures are depicted at the positions of μ j , respectively. Deep and light colors indicate large and small values of λ j ’s, respectively
Monte-Carlo simulations 1. Generate training data. 2. Select hyper-tuning parameters by the threefold cross-validation using the training data. 3. Estimate a discriminant model using the hyper-tuning parameters. 4. Calculate the misclassification rate for the training data using the discriminant model. 5. Generate test data. 6. Calculate the misclassification rate for the test data using the discriminant model. 7. Repeat 100 times from 1 to 6. 8. Calculate the averages of misclassification rates for the training and test data. The results are summarized in Table 1. We can say that K-NN, Ridge-LR, and Lasso-LR cause under-fitting because their misclassification rates for training data are large. On the other hand, C-SVM is shown to cause over-fitting from the fact in which its misclassification rate for training data is small but that for test data is large. For this type of data, it is indicated that Varying-LR provides a superior discriminant rule because its misclassification rate for test data is smallest. These considerations are consistent with Fig. 5. From above, our method seems to be more effective than the other ones when there exist two types of regions mixedly, in which under-fitting and over-fitting should be avoided. Next, we want to check whether our method is comparable to the other ones or not, even when two types of regions do not exist mixedly, that is, naive regularizations such as the other methods work well.
Discriminant Analysis via Smoothly Varying Regularization
453
Table 1 Misclassification rate for each method in the setting of Fig. 5 Training Test 5.98 × 10−2 0.23 × 10−2 6.90 × 10−2 7.13 × 10−2 5.28 × 10−2 2.65 × 10−2
15.41 × 10−2 17.92 × 10−2 15.85 × 10−2 16.12 × 10−2 17.30 × 10−2 14.75 × 10−2
−1.5
−1.5
−1.0
−1.0
−0.5
−0.5
0.0
0.0
0.5
0.5
1.0
1.0
1.5
1.5
K-NN C-SVM Ridge-LR Lasso-LR A-Lasso-LR Varying-LR
−1.5
−1.0
−0.5
0.0
0.5
1.0
−1.5
1.5
−1.0
(a) True boundary
−0.5
0.0
0.5
1.0
1.5
1.0
1.5
−1.5
−1.5
−1.0
−1.0
−0.5
−0.5
0.0
0.0
0.5
0.5
1.0
1.0
1.5
1.5
(b) Ridge-LR
−1.5
−1.0
−0.5
0.0
0.5
(c) Lasso-LR
1.0
1.5
−1.5
−1.0
−0.5
0.0
0.5
(d) Varying-LR
Fig. 7 True discriminant boundary and its estimated version by each method
454
H. Yoshida et al.
Table 2 Misclassification rate for each method in the setting of Fig. 7 Training Test K-NN C-SVM Ridge-LR Lasso-LR A-Lasso-LR Varying-LR
10.54 × 10−2 0.03 × 10−2 0.71 × 10−2 0.70 × 10−2 1.62 × 10−2 0.49 × 10−2
15.96 × 10−2 13.95 × 10−2 12.73 × 10−2 12.61 × 10−2 14.19 × 10−2 12.88 × 10−2
Data generation 1. Let z i = (z i1 , z i2 ) (i = 1, 2, . . . , 500) be uniformly distributed random variables on [−π, π ] × [−3, 3]. 2. Label z i as yi = 1 if sin 5z i1 − 1.4 ≤ z i2 ≤ sin 5z i1 + 1.4 and yi = 0 otherwise. 3. Standardize z i and denote it by x i . In these data, it is not needed to change the strength of regularizations. We use the same basis functions and candidates of tuning parameters as before, and estimated discriminant boundaries are depicted in Fig. 7. The misclassification rates are evaluated by Monte-Carlo simulations also as before in Table 2. From these, we can say that Varying-LR is not inferior to the other methods even for the cases where they are suitable.
5 Concluding Remarks In this paper, for a discriminant problem in which both under-fitting and over-fitting should be avoided, we have found a weak point of existing methods and propose a smoothly varying regularization to overcome the weak point. Then, through numerical experiments, our method has been shown to be superior to the existing methods both visually and numerically for such a discriminant problem. While here we focus on two-class discriminant problems, extending our method to multi-class discriminant problems is one of our future themes. Since multi-class discrimination boundaries have more possibility to be complex, the effectiveness of our method may be emphasized. In addition, considering a smoothly varying regularization using 1 penalty is an important theme. Since it conducts a model selection for the explanatory variables simultaneously with an estimation similarly to the lasso, it should be effective especially when the dimension of the explanatory variables is large. Acknowledgements We are grateful to the editors and two referees for their helpful comments. SK was supported by JSPS KAKENHI Grant Numbers JP19K11854 and JP20H02227, and YN was supported by JSPS KAKENHI Grant Numbers JP16K00050.
Discriminant Analysis via Smoothly Varying Regularization
455
References 1. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006) 2. Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96, 1348–1360 (2001). https://doi.org/10.1198/016214501753382273 3. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009) 4. Hoerl, A.E., Kennard, R.W.: Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12, 55–67 (1970). https://doi.org/10.1080/00401706.1970.10488634 5. Kim, D., Kawano, S., Ninomiya, Y. Smoothly varying ridge regularization (2021). arXiv preprint, 2102.00136 6. Stone, M.: Cross validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B 36, 111–133 (1974). https://doi.org/10.1111/j.2517-6161.1974.tb00994.x 7. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996). https://doi.org/10.1111/j.2517-6161.1996.tb02080.x 8. Tibshirani, R., Saunders, M., Rosset, S.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B 67, 91–108 (2005). https://doi.org/10.1111/j.1467-9868.2005.00490.x 9. Tibshirani, R.J., Taylor, J.: The solution path of the generalized lasso. Ann. Statist. 39, 1335– 1371 (2011). https://doi.org/10.1214/11-AOS878 10. Zhang, C.H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38, 894–942 (2010). https://doi.org/10.1214/09-AOS729 11. Zou, H.: The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101, 1418–1429 (2006). https://doi.org/10.1198/016214506000000735
Optimizations for Categorizations of Explanatory Variables in Linear Regression via Generalized Fused Lasso Mineaki Ohishi , Kensuke Okamura, Yoshimichi Itoh, and Hirokazu Yanagihara
Abstract In linear regression, a non-linear structure can be naturally considered by transforming quantitative explanatory variables to categorical variables. Moreover, smaller categories make estimation more flexible. However, a trade-off between flexibility of estimation and estimation accuracy occurs because the number of parameters increases for smaller categorizations. We propose an estimation method wherein parameters for categories with equal effects are equally estimated via generalized fused Lasso. By such a method, it can be expected that the degrees of freedom for the model decreases, flexibility of estimation and estimation accuracy are maintained, and categories of explanatory variables are optimized. We apply the proposed method to modeling of apartment rents in Tokyo’s 23 wards. Keywords Coordinate descent algorithm · Generalized fused lasso · Linear model · Real estate data analysis
1 Introduction For a given n-dimensional vector y of a response variable and a given n × k matrix A of explanatory variables, a linear regression model simply describes their relationship as follows: y = μ1n + Aθ + ε,
(1)
M. Ohishi (B) Education and Research Center for Artificial Intelligence and Data Innovation, Hiroshima University, Hiroshima 730-0053, Japan e-mail: [email protected] K. Okamura · Y. Itoh Tokyo Kantei Co., Ltd., Shinagawa 141-0021, Japan H. Yanagihara Graduate School of Advanced Science and Engineering, Hiroshima University, Higashi-Hiroshima 739-8526, Japan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_38
457
458
M. Ohishi et al.
where μ is a location parameter, 1n is an n-dimensional vector of ones, θ is a kdimensional vector of regression coefficients, and ε is an n-dimensional vector of an error variable with mean vector 0k and covariance matrix σ 2 I n . Here, 0k is a kdimensional vector of zeros. Although a linear regression model is a simple statistical model, many researchers study estimation methods for unknown model parameters μ and θ. If a response variable and each explanatory variable have a strong linear relationship, the linear regression model is appropriate. If not, the model does not fit well. To improve fitting in such a situation, we transform quantitative variables in explanatory variables to categorical variables by splitting them into ranges. Through this transformation, a non-linear structure can be considered in a framework of a linear regression model. Although it is desirable to use smaller categories to allow more flexible estimation, the estimation accuracy declines as the number of parameters increases. Then, to maintain flexibility of estimation and estimation accuracy, we propose an estimation method via a generalized fused Lasso (GFL; e.g., see [1] and [5]). As the name suggests, the GFL is a generalized version of fused Lasso (FL) proposed by [3]. It estimates model parameters by a penalized estimation method with the following penalty term: k
|β j − β |,
j=1 ∈D j
where D j ⊆ {1, . . . , k}\{ j} is an index set. When D j = { j + 1} ( j = 1, . . . , k − 1) and Dk = ∅, the GFL coincides with the original FL. The GFL shrinks the difference of two parameters and the two parameters are equally estimated when the difference is zero. That is, the GFL reduces the degrees of freedom for the model. We apply the GFL to parameter estimation for categorical variables with 3 or more categories. The GFL can maintain flexibility of estimation and estimation accuracy. Furthermore, when several parameters are equally estimated, it means that corresponding categories are considered as the same category. Hence, we can approach the optimization of categories for categorical variables via the GFL. We describe a model and an estimation method via the GFL, and apply the method to modeling apartment rent data covering Tokyo’s 23 wards. The remainder of the paper is organized as follows. In Sect. 2, we describe a model and its estimation method. Section 3 shows a real data example.
2 Model and Estimation 2.1 Model First, we rewrite model (1) by transforming quantitative variables in A. Each quantitative variable is transformed into a categorical variable with 3 or more categories
Optimizations for Categorizations of Explanatory Variables …
459
by splitting into small ranges. Then, we define an n × p matrix X = (X 1 , . . . , X m ) and an n × q matrix Z. The X i is an n × pi matrix ( pi ≥ 3) of a categorical variable variable or that is originally a with pi categories that is obtained from a quantitative m qualitative categorical variable, where p = i=1 pi and each X i includes a baseline. Usually, when a dummy variable with multiple categories is used as an explanatory variable, any one category is removed to avoid multicollinearity. That is, any one column of X i is removed, and the removed category is called a baseline. However, in this paper, a baseline is left to optimize categories. Note that an element of X i takes the value of 1 or 0, and X i 1 pi = 1n holds. The Z consists of remainder variables. Then, we consider the following linear regression model: y = Xβ + Zγ + ε,
(2)
where β and γ are p- and q-dimensional vectors of regression coefficients, respectively, and β = (β 1 , . . . , β m ) , corresponding to the split of X. Moreover, an intercept is not included in model (2) since each X i includes a baseline. Note that β 1 , . . . , β m are vectors of regression coefficients for categorical variables with 3 or more categories. Hence, they are estimated by the GFL. Accordingly, we estimate β and γ based on minimizing the following penalized residual sum of squares (PRSS): y − Xβ − Zγ 2 + λ
pi m
wi, j |βi, j − βi, |,
(3)
i=1 j=1 ∈Di, j
where λ is a non-negative tuning parameter, Di, j ⊆ {1, . . . , pi }\{ j} is an index set, wi, j is a positive penalty weight, and βi, j is the jth element of β i . Actually, a tuning parameter is required for each penalty term. That is, m tuning parameters are needed for the PRSS (3). However, it is complex to optimize multiple tuning parameters. Hence, we seek unification of tuning parameters by using penalty weights. The Di, j is an important set to decide pairs to shrink differences with βi, j , and must be appropriately defined from data. For example, when X i is a matrix of a categorical variable obtained from a quantitative variable, the indexes 1, . . . , pi are naturally ordered, and hence Di, j is defined as Di, j = { j + 1} ( j = 1, . . . , pi − 1) and Di, pi = ∅. This means that parameters for a categorical variable obtained from a quantitative variable are estimated by the FL. In this paper, such Di, j is called the FL-index. On the other hand, the penalty weight wi, j is used based on the idea of adaptive-Lasso proposed by [6], and a general penalty weight is the inverse of the estimate corresponding to a form of the penalty term. It is reasonable to calculate the least squares estimator (LSE) β˜i, j of βi, j and to use the following weight: wi, j =
1 |β˜i, j − β˜i, |
.
(4)
However, since each X i is a dummy variable matrix including a baseline, they are rank deficient and the LSEs cannot be calculated. To solve this problem, we calculate
460
M. Ohishi et al.
LSEs for the following model wherein baselines are removed from each X i and an intercept is added: m y = μ1n + X i(−) β i(−) + Zγ + ε, (5) i=1
where (−) denotes that a column vector or an element for a baseline is removed from the original matrix or vector. If the first column is a baseline, X i(−) is the n × ( pi − 1) matrix obtained by removing the first column from X i and β i(−) is the ( pi − 1)-dimensional vector obtained by removing the first element from β i , i.e., β i(−) = (βi,2 , . . . , βi, pi ) . By removing the baselines, we can calculate LSEs for model (5). Then, let β˜i, j ( j = 2, . . . , pi ) be LSE of βi, j , and we define β˜i,1 = 0 and calculate the penalty weight (4).
2.2 Estimation We minimize the PRSS (3) to estimate β and γ via a coordinate descent algorithm (CDA). This algorithm can obtain the optimal solution by repeating minimization along the coordinate direction. Broadly speaking, the CDA for (3) consists of two steps: a minimization step with respect to β 1 , . . . , β m and a minimization step with respect to γ . It is also important how to optimize the GFL and, fortunately, there are algorithms available for this purpose (e.g., [1], [4], and [5]). In this paper, we solve the optimization problem by using the CDA for the GFL (GFL-CDA) proposed by [1]. An algorithm to estimate β and γ is summarized as follows: Algorithm 1 input: Initial vectors for β and γ , and λ. Step 1. For all i ∈ {1, . . . , m}, fix β j ( j = i) and γ , and calculate the GFL estimates of β i via the GFL-CDA. Step 2. Fix β, and calculate the LSE of γ . Step 3. If all parameters converge, the algorithm terminates. If not, return to Step 1. Let θ = (β , γ ) = (θ1 , . . . , θr ) (r = p + q). The following criterion is used for convergence judgment in the algorithm: 2 − θ old max j∈{1,...,r } (θ new j j ) 2 max j∈{1,...,r } (θ old j )
≤
1 . 100,000
Since the tuning parameter adjusts the strength of the penalties, the selection of this parameter is key to obtain better estimates. Following [1], we decide candidates of λ. Let βˆi,max (i = 1, . . . , m) and γˆ max be estimates of βi, j ( j = 1, . . . , pi ) and γ when all categories for each the ith variable are equal, i.e., estimate of β i is given
Optimizations for Categorizations of Explanatory Variables …
461
by βˆ i = βˆi,max 1 pi . Moreover, we define λi,max as λi,max =
|x i, j ˜yi, j − βˆi,max x i, j 2 | , j∈{1,..., pi } ∈Di, j wi, j max
ˆ where x i, j is the jth column of X i , and ˜yi, j = y − m =1 β,max 1n − Z γˆ max + ˆ ˆ βi,max x i, j . The λi,max means each βi, j is updated by βi,max . Then, we select the optimal tuning parameter in 100 candidates given by λmax (3/4) j−1 ( j = 1, . . . , 100), where λmax = maxi∈{1,...,m} {λi,max }. By executing Algorithm 1 for each λ, the optimal tuning parameter is selected based on minimizing the EGCV criterion with α = log n (see [2]), which takes the form EGCV =
(residual sum of squares)/n , {1 − (degrees of freedom)/n}α
where α is a positive value adjusting the strength of the penalty for model complexity. To calculate each λi,max , βˆi,max satisfying βˆi,1 = · · · = βˆi, pi = βˆi,max is required. When β i = βi 1 pi (i = 1, . . . , m), model (2) is rewritten as y=
m
βi X i 1 pi + Zγ + ε =
i=1
m
βi 1n + Zγ + ε.
(6)
i=1
Although βˆi,max is given as the LSE of βi for this model, such a solution cannot be obtained in closed form. Then, we use a CDA to search for the solution. That is, using an algorithm like Algorithm 1, we minimize the (non-penalized) residual sum of squares for model (6).
3 Application to Modeling Apartment Rents 3.1 Data and Model In this section, we apply the method described in the previous section to real data covering studio apartment rents in Tokyo’s 23 wards. The data were collected by Tokyo Kantei Co., Ltd., which is located at Shinagawa, Tokyo, between April 2014 and April 2015, and consist of rent data for n = 61,913 apartments, with 12 items for each case. Table 1 shows data items. A1 to A5 are quantitative variables and are transformed to categorical variables when modeling. B1 to B4 are dummy variables that take the value of 1 or 0, and C1 and C2 are qualitative categorical variables with 3 or more categories. Moreover, this dataset specifies the location of each apartment in terms of the 852 areas demarcated in Fig. 1. Figure 2 is a bar graph of monthly rents for each area, and we can find that there are regional differences in the rents. In
462 Table 1 Data items Y A1 A2 A3 A4 A5 B1 B2 B3 B4 C1 C2
M. Ohishi et al.
Monthly apartment rent (yen) Floor area of apartment (m 2 ) Building age (years) Top floor Room floor Walking time (min) to the nearest station Whether the apartment has a parking lot Whether the apartment is a condominium Whether the apartment is a corner apartment Whether the apartment is a fixed-term tenancy agreement Facing direction (one of the following 8 categories) N; NE; E; SE; S; SW; W; NW Building structure (one of the following 10 categories) Wooden; Light-SF; SF; RF-C; SF-RF-C; ALC; SF-PC; PC; RF-Block; other
Y and A1 to A5 are quantitative variables. B1 to B4 are dummy variables that take the value of 1 or 0. C1 and C2 are dummy variables with multiple categories. Regarding C2, SF is steel frame, RF is reinforced, C is concrete, PC is precast C
Fig. 1 The 852 areas in Tokyo’s 23 wards
this application, let the response variable be monthly rent with the remainder set as explanatory variables. First, we describe the transformations of quantitative variables. Floor area is logarithm-transformed and divided into 100 ranges by using each 1% quantile point. Note that 15% and 16% quantile points and 27% to 29% quantile points are equal. Hence, floor area is transformed to a categorical variable with 97 categories. Next, since building age is a discrete quantitative variable that ranges from 0 years to 50 years, we regard it as a categorical variable with 51 categories.
Optimizations for Categorizations of Explanatory Variables …
463 type max mean min
monthly rent
1e+06
5e+05
0e+00 0
200
400 600 area number
800
Fig. 2 Apartment rents for each area
Similarly, we regard walking time as a categorical variable with 25 categories because the range is 1 min to 25 min. Finally, top floor and room floor are dealt with as a combined variable named floor type. Top floor is a discrete quantitative variable and data are sparse beyond the 16th floor. Hence, we conflate data for 16 or more floors into the same category and regard top floor as a categorical variable with 16 categories. Room floor is also a discrete quantitative variable and 34 and 38 to 42 are missing. Hence, we regard 34 and 35, and 38 to 43 as the same categories and regard room floor as a categorical variable with 37 categories. Then, by plotting top floor and room floor on a scatter plot, we can find 157 categories in Fig. 3, where the square boxes emphasize each point is regarded as one category or one space. Hence, we regard floor type (which is a combined variable of top floor and room floor) as a categorical variable with 157 categories. The above data are formulated as follows. Let X 1 , z 1 and z 2 be the n × p1 matrix and the n-dimensional vectors, for floor area, respectively, where p1 = 96, X 1 expresses the dummy variables for the first 96 categories, z 1 expresses the dummy variable for the last category, and z 2 expresses logarithm-transformed floor area for the last category. That is, floor area is evaluated by constants for the first 96 categories and by a linear function for the last category. Let X i (i = 2, . . . , 5) be n × pi matrices of dummy variables for building age, walking time, facing direction, and building structure, where p2 = 51, p3 = 25, p4 = 8, and p5 = 10. Let X 6 be an n × p6 matrix of dummy variables for floor type, where p6 = 157. Moreover, since monthly rent depends on location, we consider regional effects according to [1]. Let X 7 be an n × p7 matrix of dummy variables expressing the location of each apartment, where p7 = 852. Note that all X 1 , . . . , X 7 include baselines. Furthermore, let z j ( j = 3, . . . , 6) be the n-dimensional vectors of dummy variables for B1 to B4. Then, p = 1205, q = 6, and m = 7. Index sets Di, j (i = 1, . . . , m) are defined as follows. For i = 1, . . . , 4, indexes 1, . . . , pi are naturally ordered. Since floor area, building age, and walking time are quantitative variables, Di, j (i = 1, 2, 3) is defined by the FL-index. On the other
464
M. Ohishi et al.
40
room floor
35 30 25 20 15 10 5 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 top floor
Fig. 3 Floor type categories
40
room floor
35 30 25
150000 100000 50000
20 15 10 5 1
125000 100000 75000 50000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 top floor
(a) Floor effects
(b) Regional effects
Fig. 4 Estimation results for the two effects
hand, the facing direction has the following order: N → NE → E → · · · → NW → N. Hence, D4, j is defined by the FL-index with D4, p4 = {1}. In contrast, since building structure has no order, D5, j is defined by D5, j = {1, . . . , p5 }\{ j} ( j = 1, . . . , p5 ). This means that differences of parameters for all pairs are shrunken. Floor type and areas have adjacent relationships; see Figs. 1 and 3. Hence, D6, j and D7, j are defined according to the adjacent relationships.
Optimizations for Categorizations of Explanatory Variables …
(a) Log-transformed floor area
465
(b) Building age
(c) Walking time Fig. 5 Residual plots for quantitative variables
3.2 Results Figure 4 shows estimation results for floor effects and regional effects by choropleth maps. Figure 4a shows that floor type and floor effect tend to increase as top floor or room floor increases. Moreover, floor types with 157 categories are clustered to 55 types. Figure 4b displays the results for regional effects and we can find that these effects are higher for central areas compared to peripheral areas. Moreover, 852 areas are clustered to 199 areas. Figure 5 shows residual plots for quantitative variables: floor area, building age, and walking time. The residual plots are unproblematic. Since there are non-linear structures for all variables, it can be considered that transforming the quantitative variables was beneficial. Floor area has 97 categories and the first 96 categories are clustered to 50 categories as per Fig. 5a. Building age has 51 categories and they are clustered to 31 categories as per Fig. 5b. Walking time has 25 categories and they are clustered to 14 categories as per Fig. 5c. Figure 6 shows estimation results for the qualitative variables with 3 or more categories: facing direction and building structure. These variables are clustered to 2 and 3 categories; see Figs. 6a and 6b, respectively.
466
M. Ohishi et al. 3606.48
3000
N NE
W
estimates
66.79
NW
E
SW
SE
−162.09
2000
Wooden Light−SF ALC PC RF−Block Other
SF SF−RF−C SF−PC
RF−C
1366.85
1000
S
184.59 0 bilding structure
(a) Facing direction
(b) Building structure
Fig. 6 Estimation results for qualitative variables Table 2 Regression coefficients for B1 to B4 B1 B2 25187.63 −2048.48
B3 270.20
B4 −792.53
Table 2 summarizes estimates for dummy variables that take the value of 1 or 0. Finally, table 3 summarizes this real data example and shows R 2 , MER (%), 107 times EGCV value, and run time (min), where R 2 denotes coefficient of determination and MER denotes median error rate. These values are displayed for proposed method and the following two methods: • M1: For linear regression model without transformation of quantitative variables, we estimate regression coefficients by least square method. Here, as baselines for facing direction and building structure, north direction and wooden are removed. • M2: For our model described in Sect. 3.1, we estimate regression coefficients without GFL penalty. That is, LSEs used in penalty weights are used. The table reveals that the results obtained are reasonable and that our method is computationally practicable. Of course, M2 is most fitting. However, our method is also fitting well, and we found that our method has the best prediction accuracy. In this paper, we used only one tuning parameter for considering practicability. The table also indicates that if m tuning parameters were used, the optimization may take about 57 minutes, i.e. about 54 d. We consider the optimization of multiple tuning parameters as a future work.
Optimizations for Categorizations of Explanatory Variables …
467
Table 3 Summary M1 M2 Proposed
R2
MER
EGCV
Run time
0.690 0.865 0.862
10.316 5.895 6.071
5.791 2.515 2.221
– – 5.07
Acknowledgements The first author’s research was partially supported by JSPS KAKENHI Grant Number JP20H04151. The last author’s research was partially supported by JSPS KAKENHI Grant Numbers JP16H03606, JP18K03415, and JP20H04151. Moreover, the authors thank the associate editor and the two reviewers for their valuable comments.
References 1. Ohishi, M., Fukui, K., Okamura, K., Itoh, Y., Yanagihara, H.: Coordinate optimization for generalized fused Lasso. Comm. Statist. Theory Methods (2021, in press). https://doi.org/10. 1080/03610926.2021.1931888 2. Ohishi, M., Yanagihara, H., Fujikoshi, Y.: A fast algorithm for optimizing ridge parameters in a generalized ridge regression by minimizing a model selection criterion. J. Statist. Plann. Inference 204, 187–205 (2020). https://doi.org/10.1016/j.jspi.2019.04.010 3. Tibshirani, R., Saunders, M., Rosset, S.: Sparsity and smoothness via the fused Lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67, 91–108 (2005). https://doi.org/10.1111/j.1467-9868.2005. 00490.x 4. Tibshirani, R., Taylor, J.: The solution path of the generalized Lasso. Ann. Statist 39, 1335–1371 (2011). https://doi.org/10.1214/11-AOS878 5. Xin, B., Kawahara, Y., Wang, Y., Gao, W.: Efficient generalized fused Lasso and its application to the diagnosis of Alzheimer’s disease. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pp. 2163–2169. Association for the Advancement of Artificial Intelligence, California (2014) 6. Zou, H.: The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101, 1418–1429 (2006). https://doi.org/10.1198/016214506000000735
Robust Bayesian Changepoint Analysis in the Presence of Outliers Shonosuke Sugasawa and Shintaro Hashimoto
Abstract We introduce a new robust Bayesian change-point analysis in the presence of outliers. We employ an idea of general posterior based on density power divergence combined with horseshoe prior for differences of underlying signals. A posterior computation algorithm is proposed using Markov chain Monte Carlo. The proposed method is demonstrated through simulation and real data analysis. Keywords Change-point analysis · Density power divergence · Horseshoe prior · State space model
1 Introduction Change-point analysis has been recognized as one of the popular challenges for modeling non-stationary time series in a variety of scientific fields including oceanography [12], climate records [16], genetics [4] and finance [13]. There has been long history to estimate change points in both theoretical and methodological perspectives. In this paper, we consider the Bayesian approach to the change-point problem. The advantage of Bayesian approaches is capable of full probabilistic uncertainty quantification through posterior distribution. Typical Bayesian approaches to the change point problems use point process latent process (e.g. [7]), but we here adopt a more simple approach employing horseshoe prior [5] for the difference of two successive signals. The horseshoe prior strongly shrinks small differences of two signals toward zero while keeping large difference unshrunk, which results in piecewise constant estimation of signals. Although a part of this approach is investigated in Bayesian version of trend filtering (e.g. [6, 14]), the model uses normal distribution as the conditional distribution of the observed signal given the latent signal. Hence, S. Sugasawa Center for Spatial Information Science, The University of Tokyo, Tokyo, Japan e-mail: [email protected] S. Hashimoto (B) Department of Mathematics, Graduate School of Science, Hiroshima University, Hiroshima, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_39
469
470
S. Sugasawa and S. Hashimoto
if data contains outliers, the above method would provide a large number of changepoints, most of which are false positive. Our approach is based on ideas from robust statistics, namely replacing an Gaussian likelihood with a general one using the density power divergence [1]. The corresponding posterior distribution is known as general posterior (e.g. [2] and [10]), and Bayesian inference based on general posterior using divergences has been also developed in recent years (e.g. [3, 9, 11]). Since the proposed posterior distribution under horseshoe prior has intractable form, we use Metropolis within Gibbs algorithm to generate from the posterior distribution. By using posterior samples, we can calculate the posterior probabilities of change-points. Although one may use a heavy-tailed distribution for observation error, the advantage of the proposed approach is its generality of the divergence-based method. In other words, the method may be applicable to variety of response variable. We note that Fearnhead and Rigaill [8] proposed an alternative biweight loss function to reduce the influence of outlying observations in frequentist setting, but their approach is not capable of full probabilistic uncertainty quantification. The paper is organized as follows: In Sect. 2, we introduce a robust Bayesian change-point model which is based on the density power divergence and horseshoe prior. We provide a posterior computation algorithm to estimate the posterior probabilities of change points. In Sect. 3, we provide results of some numerical studies for various scenarios. In Sect. 4, we consider an application of the proposed method to the famous Well-Log data.
2 Robust Bayesian Changepoint Analysis Let θ1 , . . . , θT be unobserved true signals, and yt (t = 1, . . . , T ) be noisy observations of θt . Suppose that there are K change-points in the signal, that is, there exit t1 , . . . , t K ∈ {2, . . . , T } such that θt−1 = θt for t = tk for some k and θt−1 = θt otherwise. The goal is to make statistical inference on the change points of θt based on the observed data yt . To this end, we employ the following state space model: yt |θt ∼ N (θt , σ 2 ),
t = 1, . . . , T,
θt = θt−1 + ηt , t = 2, . . . , T,
(1)
where f (yt |θt ) is the conditional distribution of yt given θt , σ 2 is an unknown variance parameter, and ηt is the innovation in state space. We introduce a shrinkage prior for ηt to encourage θt and θt−1 take almost the same values when the time t is not a change-point. Specifically, we adopt the horseshoe prior [5] for ηt , expressed as ηt |u t ∼ N (0, λu t ),
u t ∼ C + (0, 1),
Robust Bayesian Changepoint Analysis in the Presence of Outliers
471
where λ is an unknown scale parameter and C + (0, 1) is the standard half-Cauchy distribution with density function p(z) = 2π −1 (1 + z 2 )−1 with z > 0. The horseshoe prior shrinks strongly shrinks small difference between θt and θt−1 toward zero while keeping large difference unshrunk. Note that the model (1) with the horseshoe prior is a special case of the Bayesian trend filtering [6]. The main drawback of the model (1) is that the normality assumption of yt |θt is sensitive to outliers. When yt is an outlier in the sense that |yt − θt |/σ is very large, the posterior distributions of both θt − θt−1 and θt+1 − θt would be away from zero, which results in two false change-points. To solve the problem, we employ an idea of general posterior [2] based on density power divergence [1]. We assume that σ 2 ∼ IG(aσ , bσ ) and λ ∼ C + (0, 1) as the prior distributions, where IG(a, b) is the inverse gamma distribution with parameters a > 0 and b > 0. Our general posterior distribution of the latent signals θ1:T , latent variable u 2:T and unknown parameters (σ 2 , λ) is given by π(θ1:T , u 2:T , σ 2 , λ|DT ) ∝ π(σ 2 )π(λ)
T t=1
exp{dγ (yt ; θt , σ 2 )}
T
φ(θt ; θt−1 , λu t )πC + (u t ),
(2)
t=2
where DT = {y1 , . . . , yT }, and φ(·; a, b) is the density function of the normal distribution with mean a and variance b. Here dγ (yt ; θt , σ 2 ) is the density power divergence having the following form: Dγ (yt ; θt , σ 2 ) =
1 φ(yt ; θt , σ 2 )γ − (2π σ 2 )−γ /2 (1 + γ )−3/2 , γ
(3)
where γ controls the robustness against outliers. Note that Dγ (yt ; θt , σ 2 ) − (1/γ + 1) converges to the log-likelihood log φ(yt ; θt , σ 2 ), thereby the function (3) can be regarded as a natural extension of the log-likelihood, that is, the general posterior (2) is a generalization of the standard posterior (see also [11] and [9]). To generate posterior samples from (2), we use the Metropolis-Hasting within Gibbs sampling, where the details are given as follows. • (Sampling from θ1:T ) For each t, the full conditional distribution of θt is proportional to U (θt ) ≡ exp{Dγ (yt ; θt , σ 2 )}φ(θt ; θt−1 , λu t ) I (t>1) φ(θt ; θt+1 , λu t+1 ) I (t 1) I (t < T ) 1 + + , λu t−1 λu t st σ 2 I (t > 1)θt−1 I (t < T )θt+1 yt Bt = + + . λu t−1 λu t st σ 2 At =
Then, the proposal is accepted with probability φ(yt ; θt , st σ 2 )U (θt∗ ) min 1, . φ(yt ; θt∗ , st σ 2 )U (θt ) • (Sampling from σ 2 ) The full conditional distribution of σ 2 is proportional to π(σ 2 )
T
exp{Dγ (yt ; θt , σ 2 )} ≡ V (σ 2 ).
t=1
We here simply use the random-walk MH algorithm that generates proposal σ∗2 from N (σ 2 , h 2 ) and accept it with probability min{1, V (σ∗2 )/V (σ 2 )}. • (Sampling from u t ) By using an augmented representation, u t |ξt ∼ IG(1/2, 1/ξt ) and ξt ∼ IG(1/2, 1), we can sample from the full conditional distribution of u t in two steps. First generate ξt from IG(1, 1 + u −1 t ) and then generate u t from IG(1, ξt−1 + (θt − θt−1 )2 /2λ). • (Sampling from λ) By using an augmented representation, λ|ξt ∼ IG(1/2, 1/δ) and δ ∼ IG(1/2, 1), we can sample from the full conditional distribution of λ in two steps. First δ from IG(1, 1 + λ−1 ) and then generate λ from T generate −1 −1 IG(T /2, δ + t=2 u t (θt − θt−1 )2 /2). Based on the posterior samples of θ1:T , change-point detection can be carried based on the posterior distribution of ηt = θt − θt−1 . Specifically, we compute the posterior probability that ηt ≥ c for some small c (e.g. c = 0.1), and determine the change-point if the probability is greater than 0.5. Regarding the choice of γ , we adopt γ = 0.5 as a default value, but some selection method might be adopted by considering trade-off between efficiency and robustness, as considered in [1, 17].
3 Simulation Study We investigate the performance of the proposed robust method together with the standard non-robust method using simulated data. We set T = 500 throughout this study. We adopt four scenarios for the underlying true signals, θ1 , . . . , θT , which are shown in Fig. 1. Given θt ’s, observed signals are independently generated from the model, yt = θt + εt , where εt is an error term. We considered two scenarios for εt as follows:
Robust Bayesian Changepoint Analysis in the Presence of Outliers
473 Scenario3
−1
−1
0
0
1
1
2
2
3
3
Scenario1
100
200
300
400
0
500
100
200
300
time
time
Scenario2
Scenario4
400
500
400
500
−1
2
0
4
6
1
8
2
10
0
0
100
200
300
400
500
0
time
100
200
300 time
Fig. 1 True signals under four scenarios and observed signals under scenario (i) for the error term
(i) εt ∼ 0.97 N (0, σ 2 ) + 0.03 N (1, σ 2 ),
(ii) εt ∼ σ t3 ,
where σ = 0.25 and t3 is the t-distribution with 3 degrees of freedom. In the first scenario, the second components play a role that generates outliers, and outliers are always larger than the true signals. On the other hand, outliers are naturally generated due to the heavy tail of the t-distribution. The simulated data under scenario (i) is also shown in Fig. 1. For each simulated dataset, we applied the Bayesian change-point model with the horseshoe difference prior and normal distribution for the error term (denoted by HCP). We also applied the proposed robust method with γ = 0.5 based on the above model (denoted by RHCP). In both methods, We generated 5000 posterior samples after discarding the first 2000 samples as burn-in. Unlike the posterior mode of θt+1 − θt , the posterior samples of the difference are not exactly zero, so we need to consider some systematic rule for change point detection. We here simply use
474
S. Sugasawa and S. Hashimoto Scenario 1−(ii)
−2
−1
−1
0
0
1
1
2
2
3
3
Scenario 1−(i)
100
200
300
400
0
500
100
200
300
time
time
Scenario 2−(i)
Scenario 2−(ii)
400
500
400
500
400
500
400
500
−3
−1
−2
0
−1
0
1
1
2
2
3
0
100
200
300
400
500
0
100
200
300
time
time
Scenario 3−(i)
Scenario 3−(ii)
−2
−1
−1
0
0
1
1
2
2
3
3
0
100
200
300
400
0
500
100
200
300
time
time
Scenario 4−(i)
Scenario 4−(ii)
0
2
2
4
4
6
6
8
8
10
10
0
0
100
200
300 time
400
500
0
100
200
300 time
Fig. 2 Posterior medians and detected change-points based on the HCP methods under 8 scenarios
Robust Bayesian Changepoint Analysis in the Presence of Outliers
475 Scenario 1−(ii)
−2
−1
−1
0
0
1
1
2
2
3
3
Scenario 1−(i)
100
200
300
400
0
500
100
200
300
time
time
Scenario 2−(i)
Scenario 2−(ii)
400
500
400
500
400
500
400
500
−3
−1
−2
0
−1
0
1
1
2
2
3
0
100
200
300
400
0
500
100
200
300
time
time
Scenario 3−(i)
Scenario 3−(ii)
−2
−1
−1
0
0
1
1
2
2
3
3
0
100
200
300
400
0
500
100
200
300
time
time
Scenario 4−(i)
Scenario 4−(ii)
0
2
2
4
4
6
6
8
8
10
10
0
0
100
200
300 time
400
500
0
100
200
300 time
Fig. 3 Posterior medians and detected change-points (red vertical lines) based on the RHCP methods under 8 scenarios
476
S. Sugasawa and S. Hashimoto
the posterior probability of {θt+1 − θt ≥ ν}, where ν is a small positive constant, and then time t is regarded as change point if the probability is greater than 0.5 Throughout this paper, we use ν = 0.1, but the use of other values such as ν = 0.01 and ν = 0.05 did not change the results very much. The posterior median of θt and detected change-points are shown in Figs. 2 and 3 for HCP and RHCP, respectively. Comparing the two figures, it can be seen that the HCP method detects more changepoints than the true ones. In particular, the HCP methods tend to detect change-points around outliers as it is sensitive to outliers due to the normality assumption. On the other hand, the proposed RHCP method can detect the true change-point reasonably well, and the estimation results are quite robust against outliers.
4 Real Data Example We next consider an application of the proposed method to the famous Well-Log data which is known to contain many outliers. Well-Log data is geophysical data which contains measurements of nuclear magnetic response and conveys information about rock structure and, in particular, the boundaries between different rock strata ([15]). Although outliers are artificially omitted from the data in most of existing literatures, we follow [8] analyzing the raw data including outliers. We applied the proposed RHCP method with γ = 0.5 and the non-robust HCP method to the Well-Log data which is standardized to have mean 0 and variance 1. In both method, posterior inference is carried out based on 5000 posterior samples after discarding 5000 samples as burn-in. We computed the posterior probability of {θt+1 − θt ≥ 0.1} based on the posterior samples, and determined change-points when the probability is greater than 0.5. In Fig. 4, we show the posterior medians of θt , detected change-points, and posterior probability of {θt+1 − θt ≥ 0.1}. It is observed that the non-robust HCP method tends to detect a larger number of change-points than the RHCP method. In particular, the HCP method gives change-points around outliers due to the normality assumption of the error term in HCP. On the other hand, the proposed RHCP method provides reasonable change-points by successfully eliminating irrelevant outliers.
5 Conclusion and Discussion We proposed the Bayesian change-point model based on robust divergence and horseshoe prior. Simulation study and real data analysis supported the performance of the proposed method. We simply used MCMC method, namely, generate posterior samples by single move sampler within Gibbs sampling, but the proposed method should be extended to the sequential learning algorithm to carry out change-point detection sequentially in the future (e.g. [3]).
Robust Bayesian Changepoint Analysis in the Presence of Outliers
477
2 0 −2 −4 −6
Nuclear Response (standardized)
HCP
0
1000
2000
3000
4000
3000
4000
3000
4000
3000
4000
Index
0.6 0.4 0.0
0.2
probability
0.8
1.0
HCP
0
1000
2000 Index
2 0 −2 −4 −6
Nuclear Response (standardized)
RHCP
0
1000
2000 Index
0.6 0.4 0.0
0.2
probability
0.8
1.0
RHCP
0
1000
2000 Index
Fig. 4 Posterior medians of θt (blue lines), detected change-points (red vertical lines), and posterior probability of {θt+1 − θt ≥ 0.1} based on the HCP and RHCP methods, applied to the Well-Log data
478
S. Sugasawa and S. Hashimoto
Acknowledgements The authors would like to thank the referees for the careful reading of the paper, and the valuable suggestions and comments. This work is partially supported by Japan Society for Promotion of Science (KAKENHI) grant numbers 18K12757 and 17K14233.
References 1. Basu, A., Harris, I.R., Hjort, N.L., Jones, M.C.: Robust and efficient estimation by minimising a density power divergence. Biometrika. 85, 549–559 (1998). https://doi.org/10.1093/biomet/ 85.3.549 2. Bissiri, P.G., Holmes, C.C., Walker, S.G.: A general framework for updating belief distributions. J. Roy. Stati. Soci. Ser. B. 78, 1103. (2016). https://doi.org/10.1111/rssb.12158 3. Boustati, A., Akyildiz, O.D., Damoulas, T., Johansen, A.M.: Generalised Bayesian filtering via sequential Monte Carlo. In: 34th Conference on Neural Information Processing Systems (NeurIPS 2020). https://papers.nips.cc/paper/2020/hash/ 04ecb1fa28506ccb6f72b12c0245ddbc-Abstract.html 4. Caron, F., Doucet, A., Gottardo, R.: On-line changepoint detection and parameter estimation with application to genomic data. Stat. Comput. 22, 579–595 (2012). https://doi.org/10.1007/ s11222-011-9248-x 5. Carvalho, C.M., Polson, N.G., Scott, J.G.: The horseshoe estimator for sparse signals. Biometrika 97, 465–480 (2010). https://doi.org/10.1093/biomet/asq017 6. Faulkner, J.R., Minin, V.N.: Locally Adaptive smoothing with Markov random fields and shrinkage priors. Bayesian Anal. 13, 225–252 (2018). https://projecteuclid.org/euclid.ba/ 1487905413 7. Feanhead, P.: Exact and efficient Bayesian inference for multiple changepoint problems Stat. Comput. 16, 203–213 (2006). https://doi.org/10.1007/s11222-006-8450-8 8. Fearnhead, P., Rigaill, G.: Changepoint detection in the presence of outliers. J. Am. Stat. Assoc. 114, 169–183 (2019). https://doi.org/10.1080/01621459.2017.1385466 9. Hashimoto, S., Sugasawa, S.: Robust Bayesian regression with synthetic posterior distributions. Entropy 22, 661 (2020). https://doi.org/10.3390/e22060661 10. Holmes, C., Walker, S.: Assigning a value to a power likelihood in a general Bayesian model. Biometrika 104, 497–503 (2017). https://doi.org/10.1093/biomet/asx010 11. Jewson, J., Smith, J.Q., Holmes, C.: Principles of Bayesian inference using general divergence criteria. Entropy 20, 442 (2018). https://doi.org/10.3390/e20060442 12. Kikkick, R., Eckley, I.E., Jonathan, P.: Detection of changes in variance of oceanographic timeseries using changepoint analysis. Ocean Eng. 37, 1120–1126 (2010). https://doi.org/10.1016/ j.oceaneng.2010.04.009 13. Kim, C. J., Morley, J. C. and Nelson, C. R.: The structural break in the equity premium. J. Bus. Econ. Stat. 23, 181–191 (2005). https://doi.org/10.1198/073500104000000352 14. Kowal, D.R., Matteson, D.S., Ruppert, D.: Dynamic shrinkage process. J. Roy. Stat. Soc. Ser. B. 81, 781–804 (2019). https://doi.org/10.1111/rssb.12325 15. ÓRuanaidh, J.J.K., Fitzgerald, W.J.: Numerical Bayesian Methods Applied to Signal Processing. Springer, New York (1996). https://doi.org/10.1007/978-1-4612-0717-7 16. Reeves, J., Chen, J., Wang, X.L., Lund, R., Lu, Q.Q.: A review and comparison of changepoint detection techniques for climate data. J. Appl. Meteorol. Climatol. 46, 900–915 (2007). https:// doi.org/10.1175/JAM2493.1 17. Sugasawa, S.: Robust empirical Bayes small area estimation with density power divergence. Biometrika 107, 467–480 (2020). https://doi.org/10.1093/biomet/asz075
Spatio-Temporal Adaptive Fused Lasso for Proportion Data Mariko Yamamura, Mineaki Ohishi, and Hirokazu Yanagihara
Abstract Population-corrected rates are often used in statistical documents that show the features of a municipality. In addition, it is important to determine changes of the features over time, and for this purpose, data collection is continually carried out by census. In the present study, we propose a method for analyzing the spatiotemporal effects on rates by adaptive fused lasso. For estimation, the coordinate descent algorithm, which is known to have better estimation accuracy and speed than the algorithm used in genlasso in the R software package, is used for optimization. Based on the results of the real data analysis for the crime rates in the Kinki region of Japan in 1995-2008, the proposed method can be applied to spatio-temporal proportion data analysis. Keywords Spatio-temporal proportion data · Adaptive fused lasso · Coordinate descent algorithm
1 Introduction Population-corrected rates are often used in statistical documents that show the features of a municipality. For example, the number of crimes and the number of people aged 65 and over are divided by the population to obtain the crime rate and aging rate of a municipality, respectively. However, if the number of crimes and the number of people aged 65 and over are used as they are, the result will be that these numbers are M. Yamamura (B) Department of Statistics, Radiation Effects Research Foundation, 5-2 Hijiyama Park, Minami-ku, Hiroshima 732-0815, Japan e-mail: [email protected] M. Ohishi Education and Research Center for AI and Data Innovation, Hiroshima University, 1-1-89 Higashi-Senda, Naka-ku, Hiroshima 730-0053, Japan H. Yanagihara Graduate School of Advanced Science and Engineering, Hiroshima University, 1-3-1 Kagamiyama, Higashi-Hiroshima, Hiroshima 739-8526, Japan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_40
479
480
M. Yamamura et al.
higher in urban regions with large populations, and it will not be possible to clarify in which municipalities crimes actually occur more frequently or in which the population is aging. In addition, the crime rate and the aging rate are examined over time by national census, and it is important to determine the changes in a municipality over time. Therefore, in the present study, we propose a method by which to analyze regional and temporal effects using spatio-temporal statistical analysis. The geographically weighted regression (GWR), as described by [2], is a wellknown method for spatial analysis. In short, the local regression model is applied to estimate the regional effects by setting up bands. Thus, the estimated results for the regional effects can be represented as a smoothed heat map, but the detection of hot and cold spots is weak due to smoothing. In addition, setting up bands in complex terrains with small sample sizes is difficult. Therefore, [5] proposed estimating the regional effects using the adaptive fused lasso (AFL) with the weights proposed by [10] added to fused lasso proposed by [8] . There is no need to perform the complicated process of setting up bands in the AFL. Based on the estimation results obtained by the AFL, regions with the same regional effect are combined, and the effect is shown by color-coding on the choropleth map. Since there is no smoothing, the detection of hot and cold spots is more accurate than that for GWR, even in complicated terrain. For estimation, the genlasso package of the statistical software R, proposed in [9] and created by [1], can be used. However, [5] reported that the coordinate descent algorithm (CDA) is faster and more accurate for estimation by the AFL. Referring to [5], the present study proposes the AFL for spatio-temporal effects by adding not only regional effects, but also temporal effects. The rates are subject to analysis, and the spatio-temporal effects are estimated by the AFL for the logistic regression model. Estimates of the AFL are derived by CDA applied to an upper bound for a negative log-likelihood function, which is based on a second-order approximation. The remainder of the present paper is organized as follows. In Sect. 2 we describe the spatio-temporal AFL for the logistic regression model. In Sect. 3, the optimization procedure is described as the estimation method. In Sect. 4, we give an application example of the crime rate in the Kinki region of Japan, which consists of seven prefectures: Mie, Shiga, Kyoto, Osaka, Hyogo, Nara, and Wakayama. Conclusions are presented in Sect. 5.
2 Spatio-Temporal Adaptive Fused Lasso For each of the n regions, y and m (y, m ∈ Z+ 0 ) are observed at t identical points in time. The y is the number of successes out of m trials, and the ratio y/m is the target of analysis. Let N = n × t, and denote the spatio-temporal effect by μi (i = 1, . . . , N ). The yi and mi (i = 1, . . . , N ) are the observed values in the ith spatiotemporal. Then, the probability that yi out of mi is applicable can be expressed as the binomial distribution B(mi , πi ) that the random variable Yi (Yi = 0, . . . , mi ) follows.
Spatio-Temporal Adaptive Fused Lasso for Proportion Data
481
The parameter πi can be fitted with the logistic regression model with only μi as a coefficient: exp(μi ) πi = π(μi ) = , (i = 1, . . . , N ). (1) 1 + exp(μi ) The probability function of Yi ∼ B(mi , π(μi )) is f (Yi = yi |μi ) = mi Cyi π(μi )yi {1 − π(μi )}mi −yi , (yi = 0, 1, . . . , mi ).
(2)
From (2), the log-likelihood function without the constant term is (μ) = =
N 1 yi log π(μi ) + (mi − yi ) log{1 − π(μi )} N i=1 N 1 yi μi − mi log{1 + exp(μi )} , N i=1
(3)
where μ = (μ1 , . . . , μN ) , and the prime denotes a transposed vector or matrix. For more details on the logistic regression model, see, for example, the logit model in [4]. The penalized log-likelihood function for the AFL is p (μ) = (μ) − λ
N
wij |μi − μj |.
(4)
i=1 j∈Di
ˆ = (μˆ 1 , . . . μˆ N ) is the estimate of μ and maximizes (4). Moreover, λ > 0 Here, μ is a tuning parameter. The wij = |μ˜ i − μ˜ j |−γ is the AFL weight. The μ˜ i and μ˜ j are estimates obtained from (3). Note that γ ∈ Z+ , and usually γ = 1. The reason for using the AFL instead of fused lasso is simply that the AFL satisfies the oracle condition required for space estimation involving fused lasso and increases the estimation accuracy of the computational algorithm. For the spatio-temporal effect i, Di is the subscript set of the spatio-temporal effects adjacent to spatio-temporal effect i. If we can estimate that the effects of spatio-temporal effect i and j are equal, i.e., μˆ i = μˆ j , then spatio-temporal effects i and j are combined and regarded as a single spatio-temporal effect. We describe two schemes in applying the AFL estimation to a spatio-temporal analysis. First, we set the penalty term as a spatio-temporal effect, rather than dividing it into regional and temporal effects. The specific model with regional and temporal effects is as follows: (μ) − λ1
m i=1 j∈D(spat) i
wij |μi − μj | − λ2
t k=1 l∈D(temp) k
vkl |μk − μl |.
(5)
482
M. Yamamura et al.
The terms containing λ1 and λ2 are the penalty terms for region and time, respectively. (spat) (temp) The Di and Dk are subscript sets that represent the adjacency with respect to region and time, respectively. In (5), we assume that the temporal effect is the same for all regions, or that the regional effect is the same for all time points. In contrast, (4) assumes that the temporal effect is different for each region or that the regional effect is different for each time point and thus provides more detailed results than (5). The second scheme is to use a single summation, i = 1, . . . , N , to represent the spatio-temporal effect in (4), which is related to actual programming for analysis. If we denote the region by i = 1, . . . , m and the time by k = 1, . . . , t and subscript sets of each adjacency, then (4) can be rewritten with four summations in a row as: (μ) − λ
m t
wijkl |μik − μjl |.
(6)
i=1 j∈D(spat) k=1 l∈D(temp) i k
Here, (6) is not only cumbersome in appearance, but also requires four loops in the program, such as for and while, which is computationally inefficient compared to (4), which requires only two loops. Hence, (6) should be avoided.
3 Optimization Procedure We estimate the parameter μ of the penalized log-likelihood (4). A previous study [5] used the CDA to optimize the AFL for the penalized residual sum of squares (PRSS) for a linear regression model. In the present paper, we use the CDA to optimize the AFL for the penalized log-likelihood function for the logistic regression model. However, the second-order Taylor expansion of the negative log-likelihood function (3) for μ allows us to perform analysis in the same form as the PRSS.
3.1 Gradient Methods for Minimizing ∗p (μ) ˆ by minimization as in the PRSS, we can change the sign of (4), In order to obtain μ ∗p (μ) = ∗ (μ) + λ
N
wij |μi − μj |,
(7)
i=1 j∈Di
where ∗ (μ) = −(μ). Then, we derive the following lemma, Lemma 1 The second-order Taylor expansion of ∗ (μ) near μ0 is 1 ¯ ∗ (μ) = ∗ (μ0 ) + g(μ0 ) (μ − μ0 ) + (μ − μ0 ) H(μ)(μ − μ0 ), 2
(8)
Spatio-Temporal Adaptive Fused Lasso for Proportion Data
483
¯ = μ + θ μ0 , θ ∈ (0, 1), and when μ ∂∗ (μ) 1 = − (y1 − m1 π(μ1 ), . . . , yN − mN π(μN )) , ∂μ N 1 ∂ 2 ∗ (μ) = diag (m1 π(μ1 ){1 − π(μ1 )}, . . . , mN π(μN ){1 − π(μN )}) . H(μ) = ∂μ∂μ N g(μ) =
Using Lemma 1, the following lemma is obtained: Lemma 2 The following inequality holds: ∗ (μ) ≤ ∗ (μ0 ) + g(μ0 ) (μ − μ0 ) +
L (μ − μ0 ) (μ − μ0 ) = ¯∗ (μ|μ0 ), 2
(9)
where L = maxμ ξmax (H(μ)), and ξmax (A) is the maximum eigenvalue of matrix A. The matrix H(μ) is diagonal, and π(μi ){1 − π(μi )} ≤ 1/4, so that L = (4N )−1 maxi=1,...,N mi . Theorem 1 Replace ∗ (μ) in (7) with ¯∗ (μ|μ0 ) and set ¯∗p (μ|μ0 ) = ¯∗ (μ|μ0 ) + λ
N
wij |μi − μj |.
(10)
i=1 j∈Di
Then, we have
μ1 = arg min ¯∗p (μ|μ0 ) ⇒ ∗p (μ1 ) ≤ ∗p (μ0 ). μ
(11)
Proof It follows from (11) that ¯∗p (μ1 |μ0 ) ≤ ¯∗p (μ0 |μ0 ). Note ¯∗ (μ0 |μ0 ) = ∗ (μ0 ) ˆ ≤ ¯∗ (μ|μ ˆ 0 ) from (9), from (9), Then, ¯∗p (μ1 |μ0 ) ≤ ∗p (μ0 ). In addition, since ∗ (μ) we have ∗p (μ1 ) ≤ ¯∗p (μ1 |μ0 ) from (7) and (10). Therefore, ∗p (μ1 ) ≤ ¯∗p (μ1 |μ0 ) ≤ ∗p (μ0 ). ˆ by minimizing ¯∗p (μ|μ0 ), rather than Theorem 1 implies that we can obtain μ ˆ cannot be found explicitly. Therefore, after finding μ1 = itself. Here, μ ∗ ¯ arg min (μ|μ0 ), we repeat the calculation for μ2 = arg min ¯∗ (μ|μ1 ), and for μ3 = arg min ¯∗ (μ|μ2 ), . . . , until the value converges. Moreover, ¯∗p (μ|μ0 ) can be expressed in the same form as the PRSS by expanding and transforming ¯∗ (μ|μ0 ) in (9) as follows: ∗p (μ)
L ¯∗ (μ|μ0 ) = ∗ (μ0 ) + (μ μ − 2μ0 μ + μ0 μ0 ) + g(μ0 ) μ − g(μ0 ) μ0 2 L 1 1 μ0 − g(μ0 ) − μ + const. μ0 − g(μ0 ) − μ = 2 L L
(12)
484
M. Yamamura et al.
Here, const. is a constant term that does not include μ. Ignoring const., and combining this term with (10), ¯∗p (μ|μ0 ) can be expressed in a simplified form as follows: 2 N 1 L + λ μ − ) − μ wij |μi − μj |. g(μ 0 0 2 L i=1 j∈D
(13)
i
If we ignore L/2 and replace (13) with {μ0 − 1/L · g(μ0 )} = y and μ = IN μ = Xβ, then it is similar to the form of the PRSS, which is consistent with [5].
3.2 Coordinate Descent Algorithm As well as the PRSS in [5], the estimation of μ in (13) is performed by the CDA in the descent cycle and the fusion cycle, according to [3]. In brief, in the descent cycle, μ is first estimated, and then spatio-temporal effects i and j are combined when μˆ i = μˆ j . In the fusion cycle, the estimation and combining are further updated based on the results in the descent cycle. For a detailed explanation of the descent and fusion cycles, see [5], replacing (13) with y and Xβ. Note that y in the PRSS becomes {μ0 − 1/L · g(μ0 )} in the model of the present study. As mentioned in the description of Theorem 1, we need to iterate the estimation process, updating μ0 , until the estimation results of μ converge. Therefore, {μ0 − 1/L · g(μ0 )} also needs to be updated in the iterative process. For any λ, (13) is a convex function with respect to μ, rather than with respect to λ, as in the PRSS. Therefore, we list the candidates for λ, estimate μ under each ˆ using the Bayesian information criterion (BIC) λ, and then select the best λ and μ proposed by [6]. The candidates for λ are the values obtained by partitioning the range [0, λmax ]. The term λmax is obtained as follows: λmax = max
i=1,...,N
where πˆ ∞ =
N
i=1 yi /
N i=1
4|yi − mi πˆ ∞ |
, j∈Di wij
(14)
mi . The BIC is as follows:
ˆ + df log(N m), BIC = −2 N (μ) ¯
(15)
where m ¯ = N −1 Ni=1 mi and df ≤ N is the number of spatio-temporal effects comˆ For each candidate λ, the BIC is calculated by fitting μ ˆ and df , and the bined in μ. ˆ for when the BIC is minimized among the candidates is adopted as the optimal μ estimation result.
Spatio-Temporal Adaptive Fused Lasso for Proportion Data
485
4 Application We analyze the crime rate in the Kinki region of Japan over a 14-year period from 1995 to 2008. We use the municipality data K4201 and A1101 of the System of Social and Demographic Statistics downloaded from e-Stat of the Statistics Bureau of the Ministry of Internal Affairs and Communications (see [7]). The number of municipalities differs from year to year due to the division, separation, merger, incorporation, and boundary change of municipalities during the observation period. However, data organized by the number of municipalities as of March 31, 2019 has been published, and we use this data, so that the number of municipalities is 227, which is the same for all 14 years. Ordinance-designated cities (Osaka, Kyoto, and Kobe) are treated as single cities, and wards are not used. The crime rate is calculated by dividing the number of crimes by the total population. The number of crimes (K4201) was reported every year, and the total population (A1101) was reported for 1995, 2000, and 2005. Therefore, for the years in which there were no observed values for the total population, the values for the most recent past year were used. The specific values in Sect. 2 are the number of crimes y, the total population m, the time t (14 years), and the number of municipalities n (227). The number of spatio-temporal effects is μ1 , . . . , μN : N = n × t = 3, 178. The spatio-temporal effect j ∈ Di adjacent to the spatio-temporal effect i are the years before and after i and the region adjacent to i. As a concrete example, the spatio-temporal effect i is Osaka City in the year 2000, and j ∈ Di are 13 spatio-temporal effects: Osaka City in 1999 and 2001; the cities of Sakai, Toyonaka, Suita, Moriguchi, Yao, Matsubara, Daito, Kadoma, Settsu, and Higashi-Osaka in Osaka Prefecture in 2000; and Amagasaki City in Hyogo Prefecture in 2000. The estimation results show that 3178 spatio-temporal effects are combined to obtain a total of 1753 effects. We can calculate the crime rate percentage by π(μˆ i ) × 100, and obtain the following spatio-temporal effect: the minimum, 0.116; 25th percentile, 0.954; median, 1.345; mean, 1.451; 75th percentile, 1.826; and the maximum, 5.315. The spatio-temporal effects are shown by the choropleth maps in Figs. 1, 2, and 3. The maps in Figs. 2 and 3 were created using boundary data shape files from [7]. The color coding shows low crime rates in blue and higher crime rates in white in red, and the legend in Fig. 1 applies to Figs. 2 and 3 as well. Fig. 1 shows the spatio-temporal effects by year. The squares of the same spatiotemporal effect are combined, and we can see where the squares are combined in the horizontal and vertical directions. For the combined spatio-temporal effects, the crime rates indicated by the colors can be evaluated with respect to years and municipalities. Thus, we can observe year-to-year changes in a city while also observing changes in other municipalities. Figs. 2 and 3 show the spatio-temporal effects by municipality. The right-hand panel in Fig. 3 shows a map of the 227 municipalities before the spatio-temporal effects are combined. By comparing this map with the map from 1995 to 2008, we can see where municipalities are combined by the AFL. The gray area in the upper-right panel of each map indicates Lake Biwa, the largest lake in Japan. The city that stands out in red as a hot spot is Osaka City, which is the second-
M. Yamamura et al.
277 municipalities in Kinki region, Japan
486
5 4 3 2 1
1995−2008 (14 years) Fig. 1 Spatio-temporal effects for crime rate (%) by year
Spatio-Temporal Adaptive Fused Lasso for Proportion Data
Fig. 2 Spatio-temporal effects for crime rate (%) by municipality 1/2
487
488
M. Yamamura et al.
Fig. 3 Spatio-temporal effects for crime rate (%) by municipality 2/2
largest city in Japan, only after Tokyo, which comprises 23 wards. After peaking in 2001, the crime rate in Osaka City has changed from an upward trend to a downward trend.
5 Conclusions The purpose of the present paper was to introduce the analysis method of spatiotemporal effects by the AFL for proportion data. Although we did not mention explanatory variables other than μ, adding these variables to the model will allow for a more detailed analysis. Acknowledgements The present study was supported by Grant-in-Aid for Scientific Research (B) (KAKENHI 20H04151). The Radiation Effects Research Foundation (RERF), Hiroshima and Nagasaki, Japan is a public interest foundation funded by the Japanese Ministry of Health, Labor and Welfare (MHLW) and the US Department of Energy (DOE). This publication was supported by RERF. The views of the authors do not necessarily reflect those of the two governments.
References 1. Arnold, T., Tibshirani, R.: Path algorithm for generalized lasso problems. R package version 1.5. (2020). https://cran.r-project.org/web/packages/genlasso/genlasso.pdf 2. Brunsdon, C., Fotheringham, S., Charlton, M.: Geographically weighted regression: a method for exploring spatial nonstationarity. Geogr. Anal. 28, 281–298 (1996). https://doi.org/10.1111/ j.1538-4632.1996.tb00936.x 3. Friedman, J., Hastie, T., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1, 302–332 (2007). https://doi.org/10.1214/07-AOAS131 4. McCullagh, P., Nelder, J.A.: Monographs on statistics and applied probability. In: Generalized Linear Models, 2nd edn. Chapman & Hall/CRC (1989) 5. Ohishi, M., Fukui, K., Okamura, K., Itoh, Y., Yanagihara, H.: Estimation for spatial effects by using the fused lasso, pp. 1–28. TR-No. 19-07, Hiroshima Statistical Research Group, Hiroshima (2019). http://www.math.sci.hiroshima-u.ac.jp/stat/TR/TR19/TR19-07.pdf
Spatio-Temporal Adaptive Fused Lasso for Proportion Data
489
6. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1978). https://doi. org/10.1214/aos/1176344136 7. Statistics Bureau of Japan.: e-Stat, portal site of official statistics of Japan. https://www.e-stat. go.jp (2021) 8. Tibshirani, R., Saunders, M., Rosset, S.: Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 67, 91–108 (2005). https://doi.org/10.1111/j.1467-9868.2005. 00490.x 9. Tibshirani, R., Taylor, J.: The solution path of the generalized lasso. Ann. Stat. 39, 1335–1371 (2011). https://doi.org/10.1214/11-AOS878 10. Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006). https://doi.org/10.1198/016214506000000735
Variable Fusion for Bayesian Linear Regression via Spike-and-slab Priors Shengyi Wu, Kaito Shimamura, Kohei Yoshikawa, Kazuaki Murayama, and Shuichi Kawano
Abstract In linear regression models, fusion of coefficients is used to identify predictors having similar relationships with a response. This is called variable fusion. This paper presents a novel variable fusion method in terms of Bayesian linear regression models. We focus on hierarchical Bayesian models based on a spike-and-slab prior approach. A spike-and-slab prior is tailored to perform variable fusion. To obtain estimates of the parameters, we develop a Gibbs sampler for the parameters. Simulation studies and a real data analysis show that our proposed method achieves better performance than previous methods. Keywords Dirac spike · Fusion of coefficients · Hierarchical Bayesian model · Markov chain Monte Carlo
1 Introduction In recent years, because of the rapid development of computer hardware and systems, a wide variety of data are being observed and recorded in genomics, medical science, finance, and many other fields of science. Linear regression is a fundamental statistical method for extracting useful information from such datasets. In linear regression models, fusion of coefficients is used to identify predictors having similar relationships with a response. This is called variable fusion [11]. Many kinds of research on variable fusion have been conducted to date. For example, we refer the reader to [2, 12, 18, 19]. Variable fusion is essentially achieved by modifying regularization terms that perform variable selection (e.g., the lasso [17]): the fused lasso [18], the OSCAR [2], the clustered lasso [15], and so on. It can be regarded as a frequentist approach. On the other hand, only a few methods for variable fusion have been reported in terms of a Bayesian approach. The Bayesian approach is based on priors that induce S. Wu · K. Shimamura · K. Yoshikawa · K. Murayama · S. Kawano (B) Graduate School of Informatics and Engineering, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu-shi, Tokyo 182-8585, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_41
491
492
S. Wu et al.
variable selection: the Laplace prior [14, 20], the normal-exponential-gamma (NEG) prior [8], the horseshoe prior [3], the Dirichlet-Laplace prior [1], and the spike-andslab prior [6, 7, 9]. For example, the Bayesian fused lasso by [10] is based on the Laplace prior, and the Bayesian fusion method by [16] is based on the NEG prior. The spike-and-slab prior is often used in the context of Bayesian variable selection. However, there are few research studies about spike-and-slab priors that include variable fusion. In this paper, we propose a variable fusion method in the framework of Bayesian linear regression with a spike-and-slab prior. The spike-and-slab prior is based on the Dirac spike prior [6] and the g-slab prior [22]. We tailor the Dirac spike prior and the g-slab prior to perform variable fusion by assuming the priors on the difference between adjacent parameters. We adopt hierarchical Bayesian models based on the spike-and-slab prior with variable fusion. To obtain estimates of the parameters, we develop a Gibbs sampler. The remainder of the paper is organized as follows. Section 2 introduces a spikeand-slab prior that performs variable fusion and then builds a Bayesian linear regression model based on the prior. In addition, we develop a Gibbs sampling method to obtain estimates of the parameters. Section 3 reports Monte Carlo simulations and a real data analysis to examine the performance of our proposed method and to compare it with previous methods. Conclusions are given in Sect. 4.
2 Bayesian Variable Fusion via Spike-and-slab Priors In this section, we propose a Bayesian linear regression with variable fusion. Our proposed method is based on spike-and-slab priors that fuse adjacent coefficients in linear regression models. First, we introduce spike-and-slab priors that perform variable fusion, and then we derive a Bayesian model based on the priors. A Gibbs sampler is also provided for the estimates of the parameters.
2.1 Fusing Adjacent Coefficients by Spike-and-slab Priors We consider the following linear regression model: y = Xβ + ε, ε ∼ Nn (0n , σ 2 I n ),
(1)
where y = (y1 , . . . , yn ) is a vector of the response variable, X = (x (1) , . . . , x ( p) ) is a design matrix of the predictors that is satisfied with rank(X) = p, x ( j) = (x1 j , . . . , xn j ) , β = (β1 , . . . , β p ) comprises the coefficients to be estimated, ε = (ε1 , . . . , εn ) is a vector of errors, σ 2 (σ > 0) is the variance of an error, 0n is the n-dimensional zero vector, and I n is the n × n identity matrix. Inaddition, n the response is centered and the predictors are standardized as follows: i=1 yi =
Variable Fusion for Bayesian Linear Regression …
493
n n 0, i=1 xi j = 0, i=1 xi2j = n, ( j = 1, . . . , p). The centering and standardization allow us to omit the intercept. The likelihood is given by p( y|X; β, σ 2 ) =
n
p(yi |x i ; β, σ 2 ),
(2)
i=1
where x i = (xi1 , . . . , xi p ) and p(yi |x i ; β, σ 2 ) =
√ 1 2πσ 2
2 (y −x β) exp − i 2σi2 . Here-
inafter, the likelihood p( y|X; β, σ 2 ) will be denoted as p( y|β, σ 2 ) for simplicity. Let a vector γ = (γ1 , . . . , γ p−1 ) be the differences between adjacent elements of the coefficients β; that is, γ j = β j+1 − β j ( j = 1, . . . , p − 1). We assume that γ follows the spike-and-slab prior p(γ |δ) = pslab (γ δ |δ)
pspike (γ j |δ j ),
(3)
j:δ j =0
where pslab (·) is a slab prior, pspike (·) is a spike prior, δ = (δ1 , . . . , δ p−1 ) comprises latent indicator variables having value zero or one, and γ δ comprises the elements of γ corresponding to δ j = 1. For more details of a spike-and-slab prior, we refer the reader to [7]. The prior (3) indicates that when δ j = 0, the corresponding γ j is allocated to the spike component. The other elements of γ (i.e., those with δ j = 1) are allocated to the slab component. We adopt the Dirac spike [6] and the g-slab prior [22] as pspike (·) and pslab (·), respectively. The Dirac spike can be specified as pspike (γ j |δ j ) = 0 (γ j ), where x0 (·) is the Dirac measure for a given x0 ∈ R: x0 (x) = 0 if x = x0 , while x0 (x) = 1 if x = x0 . Thus, when δ j = 0, the corresponding γ j is set to zero. This implies β j+1 = β j . The g-slab prior is defined by pslab (γ δ |δ) = N p1 (0 p1 , σ 2 H 0,δ ), p−1 where p1 = j=1 δ j and the definition of H 0,δ is discussed as follows. Before specifying the variance-covariance matrix H 0,δ in pslab (γ δ |δ), we need to define β δ and specify its prior. Let m = (m 1 , . . . , m p1 ) be a p1 dimensional vector of elements of a set { j | δ j = 1} in ascending order. We define an n × ( p1 + 1) matrix X δ as ⎡ ⎤ p m1 m2 Xδ = ⎣ x (i) , x (i) , . . . , x (i) ⎦ . i=1
i=m 1 +1
i=m p1 +1
Let β δ be a vector of the elements of β corresponding to X δ . Furthermore, we assume −1 the g-prior for β δ in the form N p1 +1 (0 p1 +1 , σ 2 B 0,δ ), where B 0,δ = g(X and δ Xδ) g is a positive hyper-parameter. Using the variance-covariance matrix B 0,δ , we can obtain the detailed formulation of the variance-covariance matrix H 0,δ . We set z i j (i, j = 1, . . . , p1 + 1) equal to the (i, j)-th element of B 0,δ and αi j (i, j = 1, . . . , p1 ) equal to the (i, j)-th
494
S. Wu et al.
element of H 0,δ . Then αi j can be written as αi j = z (i+1)( j+1) − z (i+1) j − z i( j+1) + z i j (i, j = 1, . . . , p1 ). According to the above procedure, we can specify the slab prior pslab (γ δ |δ) = N p1 (0 p1 , σ 2 H 0,δ ). We note that H 0,δ is satisfied with positive definite. The outline of the proof is as follows. For x ∈ R p1 \ {0 p1 }, we have −1 x H 0,δ x = x Dβ B 0,δ D β x > 0 unless D β x = 0 p1 +1 , since B 0,δ = g(X δ X δ ) is positive definite. Here Dβ is the p1 × ( p1 + 1) matrix defined by ⎛
−1 ⎜0 ⎜ Dβ = ⎜ . ⎝ .. 0
0 ··· 1 ··· .. . . . . 0 0 ···
1 −1 .. .
0 0 .. .
⎞ 0 0⎟ ⎟ .. ⎟ . .⎠
−1 1
In addition, D β x = 0 p1 +1 if and only if x = 0 p1 . This means that H 0,δ is positive definite. Then, the proposed Bayesian hierarchical model is given by y|β, σ 2 ∼ Nn (Xβ, σ 2 I n ), γ j (= β j+1 − β j )|δ j = 0 ( j : δ j = 0), γ δ (= Dβ β δ )|δ, σ 2 ∼ N p1 (0 p1 , σ 2 H 0,δ ) ( j : δ j = 1), δ j |ω ∼ Ber(ω), ω∼ Beta(aω , bω ), π(σ 2 )∝ σ −2 . Here the priors of δ j and ω follow [13].
2.2 Full Conditional Posteriors It is feasible to infer the posterior by using the Markov chain Monte Carlo (MCMC) method: the model parameters (δ, ω, σ 2 , β) are sampled from their full conditional posteriors. In building the Gibbs sampler of the parameters, it is essential to draw δ from the marginal posterior p(δ| y) ∝ p( y|δ) p(δ), where p( y|δ) is the marginal likelihood of the linear regression model with X δ . This marginal likelihood can be derived analytically as |H δ |1/2 (n/2) 1 p( y|δ) = . (4) (2π )(n−1)/2 |H 0,δ |1/2 sc n/2
Variable Fusion for Bayesian Linear Regression …
495
Here (·) is the Gamma function, H δ and sc are respectively given by −1
−1 H δ = (X δ X δ + D β H 0,δ D β ) , sc =
1 −1 ( y y − h δ H δ hδ ), 2
where hδ = H δ X δ y. Thus the full conditional posterior of δ j is given by δ j ωp( y|δ j = 1, δ − j ) ∼ ωp( y|δ j = 1, δ − j ) + (1 − ω) p( y|δ j = 0, δ − j ) 1−δ j (1 − ω) p( y|δ j = 0, δ − j ) × , ωp( y|δ j = 1, δ − j ) + (1 − ω) p( y|δ j = 0, δ − j )
δ j | y, ω, δ − j
where δ − j is the vector consisting of all the elements of δ except δ j . In addition, the full conditional posteriors of (ω, σ 2 , β δ ) are given by ω|δ ∼ Beta(aω + p1 , bω + p − 1 − p1 ), n , sc , σ | y, δ ∼ IG 2 β δ | y, δ, σ 2 ∼ N p1 +1 (hδ , σ 2 H δ ). 2
The derivations of the marginal likelihood and the full conditional posteriors are given in [21].
2.3 Computational Algorithm With the full conditional posteriors, we can obtain estimates of the parameters by Gibbs sampling. The Gibbs sampling algorithm for our proposed method is as follows: Step 1.
Sample (δ, σ 2 ) from the posterior p(δ| y) p(σ 2 | y, δ). 1.
Sample each element δ j of the indicator vector δ separately from p(δ j = 1|δ − j , y) given as p(δ j = 1|δ − j , y) =
2.
1 1+
1−ω ω
Rj
,
Rj =
p( y|δ j = 0, δ − j ) . p( y|δ j = 1, δ − j )
Note that the elements of δ are updated in a random permutation order. Sample the error variance σ 2 from IG(n/2, sc ).
496
Step 2. Step 3.
S. Wu et al.
Sample ω from ω ∼ Beta(aω + p1 , bω + p − 1 − p1 ). Set β j = β j+1 if δ j = 0. Sample the other elements β δ from N p1 +1 (hδ , σ 2 H δ ).
3 Numerical Studies In this section, we demonstrate the effectiveness of our proposed method and compare it with previous methods through Monte Carlo simulations. In addition, we apply our proposed method to comparative genomic hybridization (CGH) array data.
3.1 Monte Carlo Simulation We simulated data with sample size n and number of predictors p from the true model y = Xβ ∗ + ε,
(5)
where β ∗ is the p-dimensional true coefficient vector, X is the n × p design matrix, and ε is an error vector distributed as Nn (0n , 0.752 I n ). Furthermore, x i (i = 1, . . . , n) was generated from a multivariate normal distribution N p (0 p , ). Let i j be the (i, j)-th element of . If i = j, then we set i j = 1, and otherwise i j = ρ. We considered ρ = 0, 0.5. We simulated different number of observations n = 50, 100, 200. We considered the following three cases for the true coefficient vector: ∗ Case 1 is β ∗ = (1.0 5 , 1.55 , 1.05 , 1.55 ) ; Case 2 is β = (1.05 , 2.05 , 1.05 , 2.05 ) ; Case 3 is β ∗ = (1.0 5 , 3.05 , 1.05 , 3.05 ) . Here ζ p denotes a p-dimensional vector of ζ s. We used aω = bω = 1, which are included in the prior for the hyper-parameter ω. We set the value of the hyper-parameter g in the variance-covariance matrix B 0,δ to the sample size n, because model selection based on Bayes factors is consistent with the g-prior with g = n [5]. For each dataset, MCMC procedure was run for 10,000 iterations with 2000 draws as burn-in. The parameters were estimated by their posterior means. We compared our proposed method with the fused lasso (FL), the Bayesian fused lasso (BFL), and the Bayesian fused lasso via the NEG prior (BFNEG). For FL, BFL, and BFNEG, we omitted the lasso penalty term because we only focus on identifying the true groups of coefficients. The regularization parameter in the fusion penalty term was selected by the extended Bayesian information criterion (EBIC), which was introduced by [4]. The tuning parameter in EBIC was set according to [4]: 1 − log(n)/{2 log( p)}. FL is implemented in the penalized package of R. BFL
Variable Fusion for Bayesian Linear Regression …
497
and BFNEG are implemented in the neggfl package of R, which is available from https://github.com/ksmstg/neggfl. First, we compared the accuracy of prediction using the mean squared error (MSE) given by MSE = (βˆ − β ∗ ) (βˆ − β ∗ ), where βˆ is the estimated coefficients and is the variance-covariance matrix of the predictors. Additionally, we compared the accuracy of identifying the true coefficient groups between our proposed method and the competing methods. We denote the groups of indexes which have same true regression coefficients by B1 , . . . , B4 ⊂ {1, . . . , p}. We also denote the number of distinct coefficients in {βˆ j : j ∈ Bl } (l = 1, . . . , 4) by Nl . Finally, we define the accuracy by PB =
4 Nl p − l=1 . p−4
The higher value indicates more accurate variable fusion. For each simulation setting, we conducted the simulation 100 times. We computed the average of MSEs and PB s. The results are summarized in Tables 1, 2 and 3. Our proposed method outperforms the competing methods in terms of MSE and PB . In particular, the values of PB are close to one in almost all cases. This means that our proposed method almost perfectly identifies the true groups. Among FL, BFL, and BFNEG, FL provides larger values of PB than BFL and BFNEG in many cases. Both BFL and BFNEG provide unstable values of PB regardless of the simulation setting, which means that they cannot provide stability for identifying the true groups. However, note that MSE is not necessarily small even if PB is large.
Table 1 Mean (standard deviation) of MSE and mean of PB for Case 1 ρ = 0.5 ρ=0 MSE (sd) PB MSE (sd) n=50
n=100
n=200
FL BFL BFNEG Proposed FL BFL BFNEG Proposed FL BFL BFNEG Proposed
0.146 0.128 0.279 0.117 0.056 0.067 0.109 0.039 0.026 0.030 0.046 0.015
(0.079) (0.068) (0.127) (0.089) (0.032) (0.032) (0.051) (0.027) (0.014) (0.015) (0.023) (0.012)
0.842 0.731 0.336 0.980 0.764 0.768 0.593 0.992 0.769 0.714 0.625 0.990
0.433 0.226 0.240 0.104 0.153 0.070 0.117 0.028 0.099 0.035 0.032 0.014
(0.117) (0.091) (0.112) (0.083) (0.071) (0.039) (0.047) (0.018) (0.018) (0.013) (0.017) (0.009)
PB 0.689 0.388 0.369 0.971 0.832 0.789 0.321 0.991 0.620 0.579 0.612 0.990
498
S. Wu et al.
Table 2 Mean (standard deviation) of MSE and mean of PB for Case 2 ρ = 0.5 ρ=0 MSE (sd) PB MSE (sd)
PB
n=50
n=100
n=200
FL BFL BFNEG Proposed FL BFL BFNEG Proposed FL BFL BFNEG Proposed
0.136 0.155 0.247 0.078 0.065 0.096 0.097 0.027 0.028 0.048 0.050 0.013
(0.084) (0.076) (0.112) (0.061) (0.031) (0.038) (0.084) (0.019) (0.016) (0.018) (0.023) (0.011)
0.671 0.677 0.549 0.978 0.790 0.354 0.669 1.000 0.741 0.307 0.535 1.000
0.395 0.243 0.308 0.068 0.093 0.061 0.068 0.027 0.032 0.032 0.050 0.014
(0.171) (0.111) (0.130) (0.047) (0.032) (0.027) (0.029) (0.019) (0.028) (0.015) (0.017) (0.010)
Table 3 Mean (standard deviation) of MSE and mean of PB for Case 3 ρ = 0.5 ρ=0 MSE (sd) PB MSE (sd) n=50
n=100
n=200
FL BFL BFNEG Proposed FL BFL BFNEG Proposed FL BFL BFNEG Proposed
0.134 0.168 0.264 0.061 0.077 0.116 0.110 0.028 0.029 0.065 0.051 0.014
(0.109) (0.089) (0.121) (0.041) (0.046) (0.043) (0.048) (0.018) (0.015) (0.023) (0.021) (0.013)
0.812 0.694 0.549 0.988 0.851 0.243 0.602 1.000 0.775 0.158 0.583 1.000
0.675 0.321 0.327 0.065 0.289 0.076 0.111 0.025 0.048 0.051 0.056 0.014
(0.182) (0.141) (0.135) (0.045) (0.051) (0.034) (0.043) (0.017) (0.024) (0.018) (0.019) (0.009)
0.822 0.393 0.238 0.997 0.701 0.763 0.425 0.995 0.736 0.678 0.351 0.999
PB 0.865 0.231 0.212 0.987 0.769 0.607 0.295 0.993 0.895 0.364 0.176 0.994
We also performed additional simulations: cases for ε ∼ Nn (0n , 1.52 I n ). The results are given in [21]. We observe that the conclusion was essentially unchanged.
3.2 Application We applied our proposed method to a smoothing illustration for a real dataset. We used the comparative genomic hybridization (CGH) array data [19]. The data
Variable Fusion for Bayesian Linear Regression …
499
Fig. 1 Applications for the CGH data. Black dots indicate data points. The blue line is the estimates of FL, the green line is BFL, the grey line is BFNEG, and the red line is our proposed method
are available in the cghFLasso package of the software R. The data comprise log ratios of the genome numbers of copies between a normal cell and a cancer cell by genome order. We extracted 150 samples from the genome orders 50 to 200 and set them as y. The design matrix X was set as an identity matrix. Thus, the model used in this section was y = β + ε. We compared our proposed method with FL, BFL, and BFNEG. We determined the values of the hyper-parameters in our proposed method in a similar manner to that in Sect. 3.1. The values of the hyper-parameters in the other methods were selected by the EBIC. Note that we transformed y to y + 1n before analyzing the data, and we transformed y + 1n to y and βˆ to βˆ − 1n , respectively, after analyzing the data. Figure 1 shows the result. The data points from about 55–80, 95–120, and 135– 200 show that the genome copy numbers of cancer cells and normal cells are almost the same. We observe that our proposed method gives stable estimates, while the competing methods seem to overfit. The other data points show that the genome copy numbers of cancer cells are bigger than those of the normal cells. Our proposed method captures clearer differences than the competing methods.
500
S. Wu et al.
4 Conclusions We discussed the Bayesian variable fusion method in terms of linear regression models. We proposed a spike-and-slab prior that induces variable fusion, which is based on the Dirac spike and the g-slab prior. To obtain samples from the posteriors, a Gibbs sampler was designed using the hierarchical Bayesian models from the spikeand-slab prior. Numerical studies showed that our proposed method performs well compared to existing methods from various viewpoints. Our proposed method cannot perform variable selection, unlike previous Bayesian methods [10, 16]. Therefore, it would be interesting to extend our proposed method to handle variable selection. We leave this topic as future research. Acknowledgements The authors thank the two reviewers for their helpful comments and constructive suggestions. S. K. was supported by JSPS KAKENHI Grant Numbers JP19K11854 and JP20H02227.
References 1. Bhattacharya, A., Pati, D., Pillai, N.S., Dunson, D.B.: Dirichlet-laplace priors for optimal shrinkage. J. Am. Stati. Assoc. 110(512), 1479–1490 (2015). https://doi.org/10.1080/ 01621459.2014.960967 2. Bondell, H.D., Reich, B.J.: Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with Oscar. Biometrics 64(1), 115–123 (2008). https://doi.org/ 10.1111/j.1541-0420.2007.00843.x 3. Carvalho, C.M., Polson, N.G., Scott, J.G.: The horseshoe estimator for sparse signals. Biometrika 97(2), 465–480 (2010). https://doi.org/10.1093/biomet/asq017 4. Chen, J., Chen, Z.: Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95(3), 759–771 (2008). https://doi.org/10.1093/biomet/asn034 5. Fernandez, C., Ley, E., Steel, M.F.: Benchmark priors for Bayesian model averaging. J. Econ. 100(2), 381–427 (2001). https://doi.org/10.1016/S0304-4076(00)00076-2 6. Frühwirth-Schnatter, S., Wagner, H.: Bayesian variable selection for random intercept modeling of gaussian and non-gaussian data. In: Bernardo, J., Bayarri, M., Berger, J., Dawid, A., Heckerman, D., Smith, A., West, M. (eds.) Bayesian Statistics, vol. 9, pp. 165–185. Oxford University Press (2011). https://doi.org/10.1093/acprof:oso/9780199694587.003.0006 7. George, E.I., McCulloch, R.E.: Variable selection via Gibbs sampling. J. Am. Stat. Associ. 88(423), 881–889 (1993). https://doi.org/10.1080/01621459.1993.10476353 8. Griffin, J., Brown, P.: Alternative Prior Distributions for Variable Selection with Very Many More Variables than Observations. University of Warwick, Coventry. (Tech. rep.) (2005) 9. Ishwaran, H., Rao, J.S.: Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Stat. 33(2), 730–773 (2005). https://doi.org/10.1214/009053604000001147 10. Kyung, M., Gill, J., Ghosh, M., Casella, G., et al.: Penalized regression, standard errors, and bayesian lassos. Bayesian Anal. 5(2), 369–411 (2010). https://doi.org/10.1214/10-BA607 11. Land, S., Friedman, J.: Variable fusion: a new method of adaptive signal regression. Department of Statistics, Stanford University, Tech. rep. (1996) 12. Ma, S., Huang, J.: A concave pairwise fusion approach to subgroup analysis. J. Am. Stat. Assoc. 112(517), 410–423 (2017). https://doi.org/10.1080/01621459.2016.1148039 13. Malsiner-Walli, G., Wagner, H.: Comparing spike and slab priors for Bayesian variable selection. Aust. J. Stati. 40(4), 241–264 (2011). https://doi.org/10.17713/ajs.v40i4.215
Variable Fusion for Bayesian Linear Regression …
501
14. Park, T., Casella, G.: The Bayesian lasso. J. Am. Stat. Assoc. 103(482), 681–686 (2008). https:// doi.org/10.1198/016214508000000337 15. She, Y.: Sparse regression with exact clustering. Electron. J. Stat. 4, 1055–1096 (2010). https:// doi.org/10.1214/10-EJS578 16. Shimamura, K., Ueki, M., Kawano, S., Konishi, S.: Bayesian generalized fused lasso modeling via neg distribution. Commun. Stat. Theory Methods 48(16), 4132–4153 (2019). https://doi. org/10.1080/03610926.2018.1489056 17. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 58(1), 267–288 (1996). https://doi.org/10.1111/j.2517-6161.1996.tb02080.x 18. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 67(1), 91–108 (2005). https://doi.org/ 10.1111/j.1467-9868.2005.00490.x 19. Tibshirani, R., Wang, P.: Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 9(1), 18–29 (2007). https://doi.org/10.1093/biostatistics/kxm013 20. Williams, P.M.: Bayesian regularization and pruning using a Laplace prior. Neural Comput. 7(1), 117–143 (1995). https://doi.org/10.1162/neco.1995.7.1.117 21. Wu, S., Shimamura, K., Yoshikawa, K., Murayama, K., Kawano, S.: Variable fusion for bayesian linear regression via spike-and-slab priors. arXiv:2003.13299 (2020). https://arxiv. org/pdf/2003.13299 22. Zellner, A.: Bayesian estimation and prediction using asymmetric loss functions. J. Am. Stat. Assoc. 81(394), 446–451 (1986). https://doi.org/10.1080/01621459.1986.10478289
Intelligent Diagnosis and Monitoring of Systems: Methods, Tools, and Applications
Diagnosis of Active Systems with Abstract Observability Gianfranco Lamperti, Marina Zanella, and Xiangfu Zhao
Abstract Active systems (ASs) are a special class of (asynchronous) discrete-event systems (DESs). An AS is represented by a network of components, where each component is modeled as a communicating automaton. Diagnosing a DES amounts to finding out possible faults based on the DES model and a sequence of observations gathered while the DES is being operated. This is why the diagnosis engine needs to know what is observable in the behavior of the DES and what is not. The notion of observability serves this purpose. In the literature, defining the observability of a DES boils down to qualifying the state transitions of components either as observable or unobservable, where each observable transition manifests itself as an observation. Still, looking at the way humans observe reality, typically by associating a collection of events with a single, abstract perception, the state-of-the-art notion of DES observability appears somewhat narrow. This paper presents, a generalized notion of observability, where an observation is abstract rather than concrete since it is associated with a DES behavioral scenario rather than a single component transition. To support the online diagnosis engine, knowledge compilation is performed offline. The outcome is a set of data structures, called watchers, which allow for the tracking of abstract observations. Keywords Model-based diagnosis · Abduction · Active systems · Discrete-event systems · Finite automata · Observability · Abstract observations · Uncertainty
G. Lamperti (B) · M. Zanella Department of Information Engineering, University of Brescia, Brescia, Italy e-mail: [email protected] M. Zanella e-mail: [email protected] X. Zhao School of Computer and Control Engineering, Yantai University, Yantai, China e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_42
505
506
G. Lamperti et al.
1 Introduction For several decades, automated diagnosis of physical systems has been (and still is) the focus of considerable research for both Artificial Intelligence and Control Theory communities. A popular approach to the task is model-based diagnosis [17], which exploits the model of a system in order to find the causes of its abnormal behavior, based on some observations. Model-based diagnosis can be either consistency-based [36] or abduction-based [32], like in this paper. For diagnosing a dynamical system [39], a discrete-event system (DES) model can be adopted [12], this being either a Petri net [2, 10, 13, 20] or a net of communicating finite automata, one automaton for each component [1, 14, 16, 21, 22, 28, 34], like in this paper. Typically, each state transition is qualified as either normal or faulty, as in the seminal work by Sampath et al. [37, 38], albeit the automaton relevant to a DES component can represent just its nominal behavior, as in [35]. The input of the DES diagnosis task is a temporal observation, namely a sequence of temporally ordered observations relevant to the DES being operated online. The output is a set of candidates, where each candidate is a set of faults, each fault being associated with a faulty transition, although a temporaloriented approach to diagnosis of DESs has been proposed recently [5–7], where the notion of temporal fault is introduced to support explainable diagnosis. Both notions of abnormality and observability of a DES started being separated from the behavioral model of the DES in [25]. Specifically, several DESs may share the same network of components while differing in their abnormality and/or observability. The point stands, the abnormality of a DES does not depend on its component transitions only, but on the context where the DES is used, as well as on the events the diagnostician is interested to track. Likewise, the observability of a DES does not depend on its component transitions only, but on the operating context, as different contexts may cause different sets of observations, as well as on the sensing apparatus, which in turn depends on the events the diagnostician is interested in. Abnormality in DESs was further generalized to a pattern in [18, 26, 30], which can represent specific combinations of faults. Inspired by the way humans observe reality, typically by associating a collection of events with a single, abstract perception, this paper extends in a natural way the notion of DES observability. In the literature, observability is expressed by a mapping from a set of component transitions to a set of observations. In this paper, the set of component transitions in generalized to a set of behavioral scenarios, where each scenario is defined as a regular language on transitions. In a sense, observations become abstract, by representing fragments of the DES behavior. The generalization of DES observability allows for modeling real-world scenarios where, outside the DES, one can figure out the occurrence of a specific evolution of the DES, rather than a single transition. A diagnosis engine is designed, which is based on abstract observability, where so-called watchers (compiled offline) allow for the detection of the abstract observations while the DES is being operated online.
Diagnosis of Active Systems with Abstract Observability
507
2 Active System An active system (AS) is a network of components connected by links, where each component is endowed with input/output pins. A link connects an output pin of a component with an input pin of another component. Only one link is connected with a pin. Each component is modeled as a communicating automaton [9], where a transition is triggered by an event either occurring in the external world or being ready at an input pin. The occurrence of a transition consumes the triggering event and possibly generates new events on some output pins, thereby providing triggering events to other components. A link cannot contain more than one event; hence, a transition cannot occur if it is going to generate an event on a nonempty link. This results in a reaction of the AS, where a series of component transitions move the AS from its initial state to a final state, where all events are consumed (links are empty). Example 1 Outlined on the left of Fig. 1 is an AS P (protection), which includes a sensor s, a breaker b, and a link from s to b. The sensor is designed to detect a short circuit, based on a low-voltage external event, and to command the breaker to open. When the short circuit vanishes (normal voltage), the sensor commands the breaker to close. However, both the sensor and the breaker may exhibit abnormal behavior, as specified in the corresponding automata, specifically transitions s3 and b3 , respectively. The behavior of an AS is constrained by its topology and the models of components. Such constraints confine the behavior within a finite space. Definition 1 (Behavior Space) Let X be an AS. The behavior space of X is a DFA X ∗ = (, X, τ, x0 , X f )
(1)
where is the alphabet, comprising the set of component transitions, X is the set of states (S, E), where S is a tuple of component states and E is a tuple of (possibly empty) events that are ready within the links, x0 = (S0 , E 0 ) is the initial state, where all events in E 0 are empty, X f ⊆ X is the set of final states (Sf , E f ) such that all events in E f are empty, τ : X × → X is the transition function, where τ (x, t) = x when t
t
Description
s1 s detects low voltage and sends b the open event s2 s detects normal voltage and sends b the close event s3 s detects low voltage, yet does not send b any event b1 b2 b3 b4
b reacts to the open event by opening b reacts to the close event by closing b does not react to the open event b reacts to the close event by remaining closed
Fig. 1 Active system P (left), component transitions (center), and behavior space P ∗ (right)
508
G. Lamperti et al.
is triggerable at state x and x is the state reached by the consumption of the (possible) input event and the generation of the output events relevant to t. Definition 2 (Trajectory) Let X ∗ be a behavior space. A sequence T = [t1 , . . . , tq ] of component transitions in the language of X ∗ is a trajectory of X . A prefix of T is a semi-trajectory of X . Let T be a set of component transitions in X . The restriction of T on T is the sequence of component transitions T[T] = [t | t ∈ T, t ∈ T]. Example 2 Shown on the right of Fig. 1 is the behavior space P ∗ , with each state ¯ e), being identified by a triple (¯s , b, ¯ where s¯ is a state of the sensor, b¯ is a state of the breaker, and e¯ is an event in the link ( denotes an empty link). States are renamed 0 · · · 7, where 0 is the initial state, while the final states (with empty link) are 0, 2, and 3. A trajectory of P is T = [s3 , s2 , s3 , b4 , s2 , s3 , b4 ], which ends at the state 2.
3 Observability and Abnormality In order to perform the diagnosis task, the specification of an AS needs to be extended with information indicating which behavior is normal and which is abnormal. In our approach, abnormality is associated with faulty transitions. Definition 3 (Abnormality) Let T be the domain of component transitions of an AS X , and let F be a domain of symbols called faults. The abnormality of X is a set of associations between component transitions and faults, namely Abn(X ) ⊆ T × F. If (t, f ) ∈ Abn(X ), then t is faulty, else t is normal. Based on the abnormality specification, each trajectory is associated with a diagnosis. Definition 4 (Diagnosis) Let T = [t1 , . . . , tq ] be a trajectory of an AS X . The diagnosis δ of T is the set of faults associated with the faulty transitions in T , namely δ(T ) = { f | t ∈ T, (t, f ) ∈ Abn(X )}.
(2)
Since a diagnosis is a set, possible repetitions of the same fault are missing. Example 3 For the protection P (cf. Example 1), we have Abn(P) = {(s3 , s), (b3 , b)}. In general, however, several faults may be relevant to the same component, as several transitions may be faulty for the same component. Let T = [s3 , s2 , s3 , b4 , s2 , s3 , b4 ] (cf. Example 2). We have δ(T ) = {s}. In general, a diagnosis involves several faults. For instance, for T = [s1 , b3 , s2 , s3 , b4 ], we have δ(T ) = {b, s}. In particular, a diagnosis may be empty, for instance, for T = [s1 , b1 , s2 , b2 ], we have δ(T ) = ∅. To complete the information on the AS for diagnosis purposes, we need to specify when and how the behavior of the AS is observable. Specifically, each observation is associated with a regular language that is defined by a regular expression on a set of component transitions.
Diagnosis of Active Systems with Abstract Observability
509
Definition 5 (Observability) Let T be the domain of component transitions of an AS X , let L be a set of regular languages on subsets of T, and let O be a domain of symbols called observations. The observability of X is a relation Obs(X ) ⊆ 2T × L × O.
(3)
Each element in Obs(X ) is a triple (T, L, o), where T is a set of component transitions, L is a regular language on T defined by a regular expression, and o is an observation. Example 4 For the AS P, we define Obs(P) = {(T, Ls , s), (T, Lb , b), (T, La , a)}, where T = {s1 , s2 , s3 , b1 , b2 , b3 , b4 }, Ls = (s1 | s2 ), Lb = (b1 | b2 | b4 ), and La = (s2 b4 | b4 s2 ). Specifically, each normal transition is observable via the same observation (s for the sensor and b for the breaker), while a (which stands for ‘abstract’) is emitted when the transitions s2 and b4 occurs sequentially (in either order). We say that a is an abstract observation, since it is emitted in correspondence of a specific combination of transitions, rather than a single component transition. Given a triple (T, L, o) ∈ Obs(X ) and a trajectory T of X , the observation o occurs when the restriction of T on T includes a subsequence that is a string in L. Since several observations may occur at the same time, in theory, T manifests itself as a sequence of sets of observations. However, we assume that observations in the same set are perceived as sequences, where the temporal ordering of each sequence is unpredictable. In other words, a trajectory T of X is perceived by the observer as a temporal sequence of observations, called a temporal observation of X . Definition 6 (Observation Space) Let O be a set of observations. The space of O is the set of sequences of observations O ∗ = {O | O = [ o | o ∈ O ]}.
(4)
Let O = [O1 , . . . , On ] be a sequence of sets of observations. The space of O is the set of sequences of observations n o | o ∈ Oi∗ . O∗ = O¯ | O¯ =
(5)
i=1
where ’ ’ denotes the concatenation of sequences. Definition 7 (Temporal Observation) Let T = [t1 , . . . , tq ] be a trajectory in X ∗ . The sequence of sets of observations O = [ Oi | i ∈ [1 .. q], Oi = { o | j ∈ [1 .. i], T = [t j , . . . , ti ], (T, L, o) ∈ Obs(X ), T[T] ∈ L } ].
(6)
is the signature of T , denoted Sgn(T ). A sequence Oi ∈ O∗ is a temporal observation of X , and T is said to conform with O.
510
G. Lamperti et al.
Example 5 Let T = [s3 , s2 , s3 , b4 , s2 , s3 , b4 ] (cf. Example 2). The signature of T is Sgn(T ) = [∅, {s}, ∅, {b}, {a, s}, {b}]. Let O = Sgn(T ). The space of O is O∗ = {[s, b, a, s, b], [s, b, s, a, b]}. Thus, each sequence in O∗ is a temporal observation of P. Definition 8 (Candidate Set) Let O be a temporal observation of X . The candidate set of O is a set of diagnoses (O) = δ(T ) | T ∈ X ∗ , O = Sgn(T ), O ∈ O∗ .
(7)
Example 6 Let O = [s, b, s, a, b] be a temporal observation of P (cf. Example 5). Based on the behavior space P ∗ displayed in Fig. 1 and Obs(P) defined in Example 4, the only trajectory T ∈ P ∗ such that O ∈ O∗ , where O = Sgn(T ), is T = [s3 , s2 , s3 , b4 , s2 , s3 , b4 ]. According to Example 3, δ(T ) = {s}; hence, (O) = {{s}}, that is, the candidate set is a singleton. In general, however, several trajectories may fulfill Eq. 7 and, thus, several candidates may be included in (O). Solving a diagnosis problem amounts to finding the candidate set of a temporal observation of an AS being operated online.
4 Offline Preprocessing The notion of observability of an AS introduced in Definition 5 requires the diagnosis task to match trajectories of X with regular languages specified by regular expressions. Based on Eq. 7, a candidate in (O) is the diagnosis of a trajectory T such that O = Sgn(T ), O ∈ O∗ . Based on Definition 7, O ∈ O∗ means that we need to understand when observations occur based on the sequence of component transitions in T . Specifically, for each (T, L, o) ∈ Obs(X ), at any point of a prefix Ti of T , namely Ti = [t1 , . . . , ti ], we need to check if the restriction on T of a suffix of Ti is a string in L. If so, the observation o should be in a proper position in O (otherwise T unconforms with O). The critical point is therefore to keep track of possible strings in L based on sequences of component transitions in T . Since L is regular, it can be recognized by a finite automaton. However, a classical recognizer is not sufficient, as strings of the same language may overlap in T . To cope with possibly overlapping strings in the languages associated with observations, the notion of a watcher is introduced. Definition 9 (Watcher) Let X be an AS, let T be the set of component transitions in X , and let (T, L, o) ∈ Obs(X ). Let Ro be a finite automaton recognizing L. Let Ro be the NFA obtained from Ro by inserting an -transition from each non-initial state to the initial state. The watcher of o, namely Wo , is a DFA Wo = (T, W, τ, w0 , Wf ) that is obtained by the determinization of Ro .
Diagnosis of Active Systems with Abstract Observability
511
Fig. 2 From left to right: Ra , Ra , and watcher Wa , with final states being 3 and 4
Example 7 With reference to the observability Obs(P) (cf. Example 4), consider the language La = (s2 b4 | b4 s2 ), which is associated with the abstract observation a. Shown in Fig. 2 are the recognizer Ra , the NFA Ra , and the watcher Wa . Note how the -transitions in Ra allow for a continuous matching of (possibly overlapping) strings, which is in general impossible using a recognizer. To clarify, assume the following trajectory of P: T
T = [ s3 , s2 , s3 , b4 , s2 , b4 , s1 , b1 ].
(8)
T
T includes two overlapping subtrajectories in La , namely T = [ b4 , s2 ] and T = [ s2 , b4 ], where the last transition s2 of T is the first transition of T . Hence, the observation a is emitted twice in T , namely at the last transition of T and T , respectively. Assume further to trace the emission of a based on the recognizer Ra . When the final state 4 is reached, a is emitted. At this point, since no transition exits the final state 4, the recognizer starts again from the initial state 0 in order to keep matching. It first changes state to 2 in correspondence of b4 , and with s1 (mismatch) it returns to 0. The result is that owing to the overlapping of the subtrajectories T and T , the second emission of a is undetected. By contrast, consider matching T based on the watcher Wa . After the detection of a at the final state 4, the next transition b4 moves to 3, the other final state, thereby also detecting the emission of the second occurrence of a, as expected. Once generated by offline preprocessing, watchers can be conveniently exploited online by the diagnosis engine for matching trajectories with a temporal observation in order to solve a given diagnosis problem, as clarified in Sect. 5.
5 Online Diagnosis The definition of a candidate set (O) in Eq. 7 is declarative in nature: no operational actions are suggested for the solution of a diagnosis problem in Definition 8. Worse still, the assumption of the availability of the behavior space X ∗ is in general
512
G. Lamperti et al.
unrealistic, owing to the exponential explosion of the number of system states. Consequently, determining (O) becomes a computational issue where the soundness and completeness of the set of candidates are required under the assumption that X ∗ is missing. The idea is to generate (online) the subspace of X ∗ comprising exactly the trajectories T of X fulfilling the condition O = Sgn(T ), O ∈ O∗ in Eq. 7, called the diagnosis space of X constrained by O, where each state comprises diagnosis information. Definition 10 (Diagnosis Space) Let O = [o1 , . . . , on ] be a temporal observation of an AS X , where X ∗ = (, X, τ, x0 , X f ), let Obs(X ) = {(T1 , L1 , o1 ), . . . , (Tk , Lk , ok )}, let F be the domain of faults in Abn(X ), let Wi = (Ti , Wi , τi , w0i , Wf i ) be the watcher of oi , i ∈ [1 .. k], and let W = (W1 × · · · × Wk ). The diagnosis space of X constrained by O is a DFA XO∗ = (, D, τd , d0 , Df )
(9)
where D ⊆ X × 2F × W × [0 .. n] is the set of states, d0 = (x0 , ∅, w0 , 0) is the initial state, with w0 = (w01 , . . . , w0k ), Df ⊆ D is the set of final states, with (x, δ, w, ) ∈ Df when x ∈ X f and = n, and τd : D × → D is the transition function, such that τd ((x, δ, w, ), t) = (x , δ , w , ), w = (w1 , . . . , wk ), w = (w1 , . . . , wk ), when (x , δ , w , ) is connected with a state in Df , τ (x, t) = x , and: δ =
δ ∪ { f } if (t, f ) ∈ Abn(X ) δ otherwise
(10)
∀i ∈ [1 .. k], the new state wi of the watcher Wi is ⎧ ⎪ ⎨w¯ i if t ∈ Ti and τi (wi , t) = w¯ i wi = w0i if t ∈ Ti and τi (wi , t) is undefined ⎪ ⎩ / Ti wi if t ∈
(11)
and, let O = oi | i ∈ [1 .. k], τi (wi , t) = wi , wi ∈ Wf i , with |O| denoting the car dinality of O, we have = + |O|, provided ≤ n and O = o + j | j ∈ [1 .. |O|] . One may argue that Definition 10 assumes the availability of the behavior space X ∗ , thereby contradicting the assumption of its unavailability. In fact, the reference to X ∗ is handy for formal reasons only. The construction of XO∗ can (and will) be performed without X ∗ , by reasoning on the model of X and applying the triggerable component transitions for generating the system states, where a state is final when all links are empty. For instance, the condition τ (x, t) = x is translated into checking the triggerability of the component transition t at the system state x and, if so, by generating the new state x = (S , E ) as an updated copy of x by setting in S the new component state relevant to t, by removing from E the input event of t (if any), and by inserting into E the output events of t (if any). In this generation, it is essential to capture the occurrences of observations (cf. Eq. 11) and to match them against O by
Diagnosis of Active Systems with Abstract Observability
513
comparing the set O of observations occurring at t with the next observations in O, irrespective of their temporal ordering (cf. Definition 7). Eventually, only the states that are involved in a trajectory of X are retained (namely, those connected with a final state). ∗ Example 8 Shown in Fig. 3 is the generation of the diagnosis space PO , where ∗ O = [s, b, s, a, b] (cf. Example 5). The actual states of PO are renamed 0 .. 7, where 7 is the only final state. The other states (in gray) are discarded because they are not connected with any final state, thus being not part of any trajectory of P. Based on Definition 10, each state in XO∗ is identified by a quadruple ( p, δ, wa , ), where p is a state of P (cf. Fig. 1), δ is a set of faults, wa is a state of the watcher Wa (cf. Fig. 2), and ∈ [0 .. 5] is the index of O, indicating the prefix of O matched already. Note that the states of the watchers of the observations s and b are missing because both languages Ls = {s1 , s2 } and Lb = {b1 , b2 , b4 } include single component transitions only. This allows for a direct detection of the observations s and b based on the current component transition triggered in the trajectory. In general, for efficiency purposes, the field w within a state (x, δ, w, ) of XO∗ comprises only the states of the watchers of the (actual) abstract observations, in our example, the watcher of a. ∗ includes one trajectory only, namely T = [s3 , s2 , s3 , b4 , s2 , s3 , b4 ]. In summary, PO Note how the final state 7 involves δ = {s}, which is the diagnosis of T (cf. Example 3). Also, based on Example 6, we have (O) = {{s}}. This is no coincidence, as proven in Theorem 1.
Theorem 1 Let XO∗ = (, D, τd , d0 , Df ) be a diagnosis space. We have (O) = {δ | (x, δ, w, ) ∈ Df } .
(12)
In other words, the candidate set (O) can be determined by collecting the diagnoses marking the final states of the diagnosis space. Proof First we prove that the language of XO∗ is the sublanguage of X ∗ comprising the trajectories that conform with the temporal observation O, namely
∗ , where the dashed part (states in gray) is discarded Fig. 3 Generation of the diagnosis space PO
514
G. Lamperti et al.
T | T ∈ XO∗ = T | T ∈ X ∗ , O = Sgn(T ), O ∈ O∗ .
(13)
(Soundness) If T ∈ XO∗ , then T ∈ X ∗ and O ∈ O∗ , where O = Sgn(T ). Based on Definition 10, τd ((x, δ, w, ), t) = (x , δ , w , ) in XO∗ requires τ (x, t) = x in X ∗ ; hence, T ∈ X ∗ . Based on Definition 7, given T = [t1 , . . . , tn ], O = Sgn(T ) = [O1 , . . . , On ], where Oi ∈ O, i ∈ [1 .. n], is a (possibly empty) set of observations ∈ L, where o such that, given a suffix T of the prefix [t1 , . . . , ti ] of T , we have T[T] (T, L, o) ∈ Obs(X ). Any sequence of observations obtained by transforming each set in O into a sequence and by concatenating all these sequences is a temporal observation O ∈ O∗ . Based on Definition 10, the new states of the watchers in w , defined in Eq. 11, allow for the computation of a set O of observations that equals a corresponding set in O. In fact, O is matched against the next observations in O, namely o +1 , . . . , o +|O| , where |O| is the cardinality of O. Eventually, this matching guarantees that O can be generated by concatenating all Oi , i ∈ [1 .. n], where Oi is obtained by transforming Oi ∈ O into a sequence. Hence, based on Definition 6, O ∈ O∗ . (Completeness) If T ∈ X ∗ and O ∈ O∗ , with O = Sgn(T ), then T ∈ XO∗ . Based on Definition 10, τ (x, t) = x in X ∗ is a requisite for the definition of τd ((x, δ, w, ), t) = (x , δ , w , ) in XO∗ . We have to show that the additional constraint O ∈ O∗ , with O = Sgn(T ), is valid for XO∗ also. Based on Definition 6 and Definition 7, O ∈ O∗ , with O = Sgn(T ), means that O can be generated by concatenating the sequences Oi , i ∈ [1 .. n], obtained from each set of observations Oi ∈ O, where each Oi includes the observations o such that (T, L, o) ∈ Obs(X ), T is a ∈ L. Since for the tuple w of watcher states the same suffix of [t1 , . . . , ti ], and T[T] Oi is generated based on the final states of the watchers, the subsequent conditions on the matching of Oi against the next observations in O, namely o +1 , . . . , o +|Oi | , with |Oi | being the cardinality of Oi , are fulfilled for all transitions in ti ∈ T . Hence, T ∈ XO∗ . Hence, Eq. 13 is true. ¯ w, ) ∈ To complete the proof of Theorem 1, we have to show that, if d = (s, δ, ¯ Df , then, for each trajectory T ending in d, we have δ(T ) = δ. According to Definition 4, δ(T ) = { f | t ∈ T, (t, f ) ∈ Abn(X )}. Based on Eq. 10, the field δ (initially set to ∅) is extended for each faulty transition in T with f , where (t, f ) ∈ Abn(X ). In ¯ other words, δ¯ includes all the faults relevant to T . Hence, based on Equ 2, δ(T ) = δ.
6 Conclusion Observability is an essential property of an AS (more generally, of a DES), in order to perform the diagnosis task. Still, observability has received little attention from the research community so far. Remarkably, generalized observability may spur the generalization of other DES properties. For instance, the property of DES diagnosability, which has been extensively investigated in the literature [3, 11, 15, 19, 31, 33, 37], can be extended based on a generalization of both observability and abnor-
Diagnosis of Active Systems with Abstract Observability
515
mality. Also, uncertain abstract observations can be a natural evolution of uncertain temporal observations [24]. Consequently, the notion of diagnosability of DES with uncertain observations [40] can be extended to deal with uncertain abstract observations also. Inspired by the way humans observe reality, where a combination of events may be registered by the mind as a single perception, this paper has proposed a generalized notion of DES observability, along with a diagnosis technique based on abstract observations. What is observed is no longer confined to a single transition, as assumed by the state of the art; instead, it is extended to an entire scenario of the DES behavior. On the other hand, this generalization may become computationally costly in real applications, since the detection of the abstract observations requires the diagnosis engine to store the states of the watchers within the states of the diagnosis space of the DES (cf. Sect. 5). To face this problem, knowledge compilation techniques may be developed for constructing extended dictionaries, like in [4, 8], where the candidate set may be determined efficiently based on the temporal observation. Future research may also include the integration of generalized observability with both generalized abnormality [26, 30] and uncertain temporal observations [24], as well as the adaptation of generalized observability to complex ASs [23, 27] and deep DESs [29]. Acknowledgements This work was supported in part by the National Natural Science Foundation of China (grant number 61972360).
References 1. Baroni, P., Lamperti, G., Pogliano, P., Zanella, M.: Diagnosis of large active systems. Artifi. Intell. 110(1), 135–183 (1999). https://doi.org/10.1016/S0004-3702(99)00019-3 2. Basile, F.: Overview of fault diagnosis methods based on Petri net models. In: Proceedings of the 2014 European Control Conference, ECC 2014, pp. 2636–2642 (2014). https://doi.org/10. 1109/ECC.2014.6862631 3. Basilio, J., Lafortune, S.: Robust codiagnosability of discrete event systems. In: Proceedings of the American Control Conference, pp. 2202–2209. IEEE (2009). https://doi.org/10.1109/ ACC.2009.5160208 4. Bertoglio, N., Lamperti, G., Zanella, M.: Intelligent diagnosis of discrete-event systems with preprocessing of critical scenarios. In: Czarnowski, I., Howlett, R., Jain, L. (eds.) Intelligent Decision Technologies 2019, Smart Innovation, Systems and Technologies, vol. 142, pp. 109– 121. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-8311-3_10 5. Bertoglio, N., Lamperti, G., Zanella, M., Zhao, X.: Diagnosis of temporal faults in discreteevent systems. In: Giacomo, G.D., Catala, A., Dilkina, B., Milano, M., Barro, S., Bugarín, A., Lang, J. (eds.) 24th European Conference on Artificial Intelligence, Frontiers in Artificial Intelligence and Applications, vol. 325, pp. 632–639. IOS Press, Amsterdam (2020). https:// doi.org/10.3233/FAIA200148 6. Bertoglio, N., Lamperti, G., Zanella, M., Zhao, X.: Explanatory diagnosis of discrete-event systems with temporal information and smart knowledge-compilation. In: Calvanese, D., Erdem, E., Thielsher, M. (eds.) Proceedings of the 17th International Conference on Principles of Knowledge Representation and Reasoning (KR 2020), pp. 130–140. IJCAI Organization (2020). https://doi.org/10.24963/kr.2020/14
516
G. Lamperti et al.
7. Bertoglio, N., Lamperti, G., Zanella, M., Zhao, X.: Explanatory monitoring of discrete-event systems. In: Czarnowski, I., Howlett, R., Jain, L. (eds.) Intelligent Decision Technologies 2020, Smart Innovation, Systems and Technologies, vol. 193, pp. 63–77. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-5925-9_6 8. Bertoglio, N., Lamperti, G., Zanella, M., Zhao, X.: Temporal-fault diagnosis for criticaldecision making in discrete-event systems. In: Cristani, M., Toro, C., Zanni-Merk, C., Howlett, R., Jain, L. (eds.) Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 24th International Conference KES2020, Procedia Computer Science, vol. 176, pp. 521–530. Elsevier (2020). https://doi.org/10.1016/j.procs.2020.08.054 9. Brand, D., Zafiropulo, P.: On communicating finite-state machines. J. ACM 30(2), 323–342 (1983). https://doi.org/10.1145/322374.322380 10. Cabasino, M.P., Giua, A., Seatzu, C.: Fault detection for discrete event systems using Petri nets with unobservable transitions. Automatica 46, 1531–1539 (2010) 11. Carvalho, L., Basilio, J., Moreira, M., Bermeo, L.: Diagnosability of intermittent sensor faults in discrete event systems. In: Proceedings of the American Control Conference, pp. 929–934 (2013). https://doi.org/10.1109/ACC.2013.6579955 12. Cassandras, C., Lafortune, S.: Introduction to Discrete Event Systems, 2nd edn. Springer, New York (2008) 13. Cong, X., Fanti, M., Mangini, A., Li, Z.: Decentralized diagnosis by Petri nets and integer linear programming. IEEE Trans. Syst. Man Cybern. Syst. 48(10), 1689–1700 (2018) 14. Debouk, R., Lafortune, S., Teneketzis, D.: Coordinated decentralized protocols for failure diagnosis of discrete-event systems. J. Discrete Event Dyn. Syst. Theory Appl. 10(1–2), 33–86 (2000) 15. Grastien, A.: Symbolic testing of diagnosability. In: Frisk, E., Nyberg, M., Krysander, M., Aslund, J.(eds.) Proceedings of the 20th International Workshop on Principles of Diagnosis, pp. 131–138 (2009) 16. Grastien, A., Cordier, M., Largouët, C.: Incremental diagnosis of discrete-event systems. In: Nineteenth International Joint Conference on Artificial Intelligence (IJCAI 2005), pp. 1564– 1565. Edinburgh, UK (2005) 17. Hamscher, W., Console, L., de Kleer, J. (eds.): Readings in Model-Based Diagnosis. Morgan Kaufmann, San Mateo, CA (1992) 18. Jéron, T., Marchand, H., Pinchinat, S., Cordier, M.: Supervision patterns in discrete event systems diagnosis. In: Workshop on Discrete Event Systems (WODES 2006), pp. 262–268. IEEE Computer Society, Ann Arbor, MI (2006) 19. Jiang, S., Huang, Z., Chandra, V., Kumar, R.: A polynomial algorithm for testing diagnosability of discrete event systems. IEEE Trans. Autom. Control 46(8), 1318–1321 (2001) 20. Jiroveanu, G., Boel, R., Bordbar, B.: On-line monitoring of large Petri net models under partial observation. J. Discrete Event Dyn. Syst. 18, 323–354 (2008) 21. Kan John, P., Grastien, A.: Local consistency and junction tree for diagnosis of discrete-event systems. In: Eighteenth European Conference on Artificial Intelligence (ECAI 2008), pp. 209– 213. IOS Press, Amsterdam, Patras, Greece (2008) 22. Kwong, R., Yonge-Mallo, D.: Fault diagnosis in discrete-event systems: incomplete models and learning. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 41(1), 118–130 (2011) 23. Lamperti, G., Quarenghi, G.: Intelligent monitoring of complex discrete-event systems. In: Czarnowski, I., Caballero, A., Howlett, R., Jai, L. (eds.) Intelligent Decision Technologies 2016, Smart Innovation, Systems and Technologies, vol. 56, pp. 215–229. Springer International Publishing Switzerland (2016). https://doi.org/10.1007/978-3-319-39630-9_18 24. Lamperti, G., Zanella, M.: Diagnosis of discrete-event systems from uncertain temporal observations. Artif. Intell. 137(1–2), 91–163 (2002). https://doi.org/10.1016/S00043702(02)00123-6 25. Lamperti, G., Zanella, M.: Flexible diagnosis of discrete-event systems by similarity-based reasoning techniques. Artif. Intell. 170(3), 232–297 (2006). https://doi.org/10.1016/j.artint. 2005.08.002
Diagnosis of Active Systems with Abstract Observability
517
26. Lamperti, G., Zanella, M.: Context-sensitive diagnosis of discrete-event systems. In: Walsh, T. (ed.) Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI 2011), vol. 2, pp. 969–975. AAAI Press, Barcelona, Spain (2011) 27. Lamperti, G., Zanella, M., Zhao, X.: Abductive diagnosis of complex active systems with compiled knowledge. In: Thielscher, M., Toni, F., Wolter, F. (eds.) Principles of Knowledge Representation and Reasoning: Proceedings of the Sixteenth International Conference (KR 2018), pp. 464–473. AAAI Press, Tempe, Arizona (2018) 28. Lamperti, G., Zanella, M., Zhao, X.: Introduction to Diagnosis of Active Systems. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-92733-6 29. Lamperti, G., Zanella, M., Zhao, X.: Diagnosis of deep discrete-event systems. J. Artif. Intell. Res. 69, 1473–1532 (2020). https://doi.org/10.1613/jair.1.12171 30. Lamperti, G., Zhao, X.: Diagnosis of active systems by semantic patterns. IEEE Trans. Syst. Man Cybern. Syst. 44(8), 1028–1043 (2014). https://doi.org/10.1109/TSMC.2013.2296277 31. Li, B., Khlif-Bouassida, M., Toguyéni, A.: Reduction rules for diagnosability analysis of complex systems modeled by labeled Petri nets. IEEE Trans. Autom. Sci. Eng. (2019). https://doi. org/10.1109/TASE.2019.2933230 32. McIlraith, S.: Explanatory diagnosis: conjecturing actions to explain observations. In: Sixth International Conference on Principles of Knowledge Representation and Reasoning (KR 1998), pp. 167–177. Morgan Kaufmann, S. Francisco, CA, Trento, I (1998) 33. Pencolé, Y.: Diagnosability analysis of distributed discrete event systems. In: Sixteenth European Conference on Artificial Intelligence (ECAI 2004), pp. 43–47. Valencia, Spain (2004) 34. Pencolé, Y., Cordier, M.: A formal framework for the decentralized diagnosis of large scale discrete event systems and its application to telecommunication networks. Artif. Intell. 164(1– 2), 121–170 (2005) 35. Pencolé, Y., Steinbauer, G., Mühlbacher, C., Travé-Massuyès, L.: Diagnosing discrete event systems using nominal models only. In: Zanella, M., Pill, I., Cimatti, A. (eds.) 28th International Workshop on Principles of Diagnosis (DX’17), vol. 4, pp. 169–183. Kalpa Publications in Computing (2018). https://doi.org/10.29007/1d2x 36. Reiter, R.: A theory of diagnosis from first principles. Artif. Intell. 32(1), 57–95 (1987) 37. Sampath, M., Sengupta, R., Lafortune, S., Sinnamohideen, K., Teneketzis, D.: Diagnosability of discrete-event systems. IEEE Trans. Autom. Control 40(9), 1555–1575 (1995) 38. Sampath, M., Sengupta, R., Lafortune, S., Sinnamohideen, K., Teneketzis, D.: Failure diagnosis using discrete-event models. IEEE Trans. Control Syst. Technol. 4(2), 105–124 (1996) 39. Struss, P.: Fundamentals of model-based diagnosis of dynamic systems. In: Fifteenth International Joint Conference on Artificial Intelligence (IJCAI 1997), pp. 480–485. Nagoya, Japan (1997) 40. Su, X., Zanella, M., Grastien, A.: Diagnosability of discrete-event systems with uncertain observations. In: 25th International Joint Conference on Artificial Intelligence (IJCAI 2016), pp. 1265–1571. New York, NY (2016)
Java2CSP—A Model-Based Diagnosis Tool Not Only for Software Debugging Franz Wotawa and Vlad Andrei Dumitru
Abstract Model-based reasoning has been an active research area for several decades providing foundations for fault detection, localization, and repair not only in the context of system and hardware diagnosis but also in localizing software bugs. Java2CSP allows for mapping Java-like programs into a corresponding constraint representation. The constraint representation makes use of a health state variable indicating whether a certain statement is correct or faulty. A constraint solver can be used for computing diagnoses for a given failing test case. In this paper, we present the tool and also show how more classical diagnosis problems can be directly mapped to their program representation, which allows to use Java2CSP in different settings, including teaching model-based diagnosis. Keywords Model-based diagnosis · Automated software debugging · Debugging tool
1 Introduction Detecting, localizing, and repairing faults in any system is an important task for assuring that the system correctly operates over time. Automation of this task reduces costs and also may lead to further advance like enabling a system to react on and compensate faults so that system keeps operational. Such functionality is very much important for systems that are hardly accessible for repair, e.g., satellites or other space probes, or have to proceed for a longer time until reaching a safe state. The F. Wotawa (B) Christian Doppler Laboratory for Quality Assurance Methodologies for Autonomous Cyber-Physical Systems, Institute for Software Technology, Graz University of Technology, Graz, Austria e-mail: [email protected] V. A. Dumitru Graz University of Technology,Graz, Austria e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_43
519
520
F. Wotawa and V. A. Dumitru
latter is of particular importance for autonomous driving where a car may still drive for miles until reaching a parking lot. In this paper, we focus on model-based reasoning, i.e., a methodology for reasoning about a system making direct use of the model of the system. Model-based reasoning and in particular model-based diagnosis (MBD) (see [3]) have been developed in the early 80s of the last century in order to cope with challenges originating from classical expert systems. MBD has also been called reasoning from first principles [9] because it allows for computing diagnosis directly from system models. Although MBD has proved to be applicable in many areas ranging from space [8], automotive [6, 7, 10], and also debugging [2, 4, 11], there is no wide-spread use in industrial practice. There are many reasons behind this observation including challenges related to modeling different types of systems and re-using other models (e.g., simulation models), the availability of (industrial strength) tools, lack of knowledge due to the limited use in study programs, and insufficient integration into used design processes. In this paper, we focus more on the educational part, and discuss how an available tool can be used not only for demonstrating the use of MBD for fault localization in programs but also for providing a basis for carrying out diagnosis experiments. We show how different diagnosis problems like diagnosis of Boolean and numeric circuit or deterministic finite automaton can be represented as a program that can be debugged using the Java2CSP tool, which is available over the Internet.1 Besides outlining the underlying foundations, we show the interaction capabilities of the tool and provide means for mapping different diagnosis problems into their program representation. We further discuss obtained results for the different circuits and systems when using the online tool.
2 Foundations Model-based reasoning [5, 9] has been introduced for allowing to derive diagnoses directly from the model of a system. In its original form, it requires a logical model capturing the correct behavior of components, the structure of the system, and a set of observations. It is assumed that the correct behavior of components is formalized using the predicate ab which represents the abnormal health state of a component. The behavior of any component C can be characterized stating that either the component is faulty, i.e., ab(C) or that the expected behavior of the component is assumed to be true, i.e., behav(C) holds. This idea can also be used for debugging programs. Instead of talking about components, we focus on statements. A statement si of the form x = y + 1; can be represented in the model as a constraint ab(si ) ∨ x = y + 1. In order to convert the whole program in such a constraint representation, we have to assure that all variables are only defined once, requiring a static single assignment form of the program [1]. In addition, we have to handle recursive calls 1 See
http://modelinho.ist.tugraz.at:8081/.
Java2CSP—A Model-Based Diagnosis Tool Not Only for Software …
521
and loops, which can be represented using nested conditional statements. For more details regarding the modeling, we refer the interested reader to [11]. It is worth noting that the resulting static single assignment form has the same semantics than the original program when assuming that no input exceeds the number of nested conditional statements used to represent loops and recursions. Taking a constraint representation C of a program , the set of statements S and a constraint representation of observations O defining inputs and expected outputs, we are able to formally define what a diagnosis is. Definition 1 (Diagnosis) For a given tuple (C , S , O), where C is a constraint representation of program , S its statements, and O observations (i.e., a failing test case), a set ⊆ S is a diagnosis if and only if the set of constraints C ∪ O ∪ {ab(s)|s ∈ } ∪ {¬ab(s)|s ∈ S \ } is satisfiable. Diagnosis as defined above is searching for statements that have to be faulty in order not to contradict given observations. This check for satisfiability can be done using any constraint solver. Note that in practice, diagnosis is focusing on finding the smallest number of statements that should be faulty. Such diagnoses are called minimal diagnoses and are defined as set of statements such that no proper subset itself is a diagnosis. Let us have a look at a simple example program (ignoring data type definitions and other syntax rules that apply to any programming language) for illustrating the definition. 1: y = 2; 2: x = y + 1; This program can be compiled into the following set of constraints: {(ab(1) ∨ y1 = 2), (ab(2) ∨ x1 = y1 + 1)}. Now assume that we expected x to be 4 after calling the program. In this case, we are interested in a diagnosis for ({(ab(1) ∨ y1 = 2), (ab(2) ∨ x1 = y1 + 1)}, {1, 2}, {x1 = 4}). When assuming all statements to be correct the constraints are in contradiction. However, when either assuming state 1 or 2 to be faulty, we would get satisfied sets of constraints. Therefore, {1} and {2} are both (minimal) diagnoses. Note that when adding also information regarding variable y that is assumed to be 2, only the diagnosis {2} would remain. In the next section, we outline the functionality of the Java2CSP tool and show how to use it.
3 Debugging Using Java2CSP Java2CSP follows a client–server architecture, in which two servers are required in order to provide the full functionality—one dealing with manipulating and transforming the code, and one evaluating the constraint representation. This architecture allows for the development of different types of clients, suited at different use cases.
522
F. Wotawa and V. A. Dumitru
In its current embodiment, two clients are implemented: a graphical web interface, useful for demonstrations and quick analysis, and a command-line client, useful for automating the actions performed through the graphical interface. The graphical web interface consists of a single-page application that takes the user through the diagnosis process, consisting of four steps (see Fig. 1). At the bottom of every step there is a Help button that displays a tutorial on how to proceed. Each submitted action is sent to the server, and the result is made available to the user —in the case of a successful query, the intermediate result is stored, and the interface advances to the next step, if there is any left. In the less desirable case of an error, its description is shown to the user and no further action is taken. Note in any step, the user can go back one step, change parameter, and execute the step again. The diagnosis process is split into four steps: Submit Code The user inputs the code to be analyzed, in the form of a Java class declaration containing one or more methods of interest. By pressing Next, the code is submitted for analysis.
Fig. 1 Screenshots of the graphical web interface throughout a diagnosis session
Java2CSP—A Model-Based Diagnosis Tool Not Only for Software …
523
Setup Test Scenario Having transformed the given code into an internal representation, this step involves selecting the method under test, and assigning values to its arguments. For each method in the given class declaration, its internal representation is shown (by clicking View model), and the method can be selected for further analysis by expanding the Test method container, which lists the method’s arguments as input boxes, for the user to fill in with the desired values. Each input box is decorated with the name of the argument, as well as its required type. Values are represented as S-expressions, in which both the type and the actual value have to be specified, for example, (int 42) represents the integer number 42, while (float 42) represents the floating point value 42. Along with the numerical types int and float, there is also a Boolean type, whose values are (bool #T) and (bool #F), respectively. The interface provides visual validation of the syntax while the user is typing. Advancing to the next step is done by clicking the Apply button of the selected method under test, which is only made available after all of the arguments have been given. A method with no arguments is considered to have all of its arguments provided, and such methods will not request anything but to proceed to the following step. Additional Constraints This step involves the creation of constraints to be placed on the variables of the method at different points in the execution, in the form of equality clauses that force certain values on the constrained variables. The static single assignment (SSA) form of the method under test can be inspected by expanding the View SSA Form container. The Additional Constraints container is initialized with an empty set of constraints. By clicking on the Add constraint button, a new element is added, being displayed as a row with two input fields and a button that discards it from the set. Note that the constraints are used to specify expected intermediate or output values. The first input field specifies the variable that the constraint is being placed on, and features suggestions while the user is typing, which is useful in listing all of the SSA indices of the variable in question. The second input field specifies the value given to the variable, in the same form as for specifying the input arguments in the second step. After specifying zero or more constraints, clicking on the Next proceeds to create the constraint formulation from all the parameters that have been given so far: method under test, input arguments, and additional constraints. CSP Formulation The final step of the process involves the evaluation of the constraint formulation. The user can inspect said formulation by expanding the View SMT-LIB2 Formulation. The format of this follows the SMT-LIB2 language,2 so as not to depend on a particular solver. At this stage, the user can take the textual representation and feed it into such a solver, which will result in a set of mappings of the abnormal predicates that indicate whether a particular statement is considered faulty or not.
2 see
http://smtlib.cs.uiowa.edu.
524
F. Wotawa and V. A. Dumitru
Evaluating the formulation is available through the use of Z3,3 or more specifically an HTTP wrapper over its functionality, named Z3AAS, which is short for Z3-as-aService. This can be done by expanding the Submit model to Z3AAS container, and then clicking on the Run button. As soon as the server is done with processing the job, the results are displayed, i.e., a set of statements that are considered faulty. Next to each of the statements, there exists a button labeled Mark as false abnormal, which inserts the selected statement into the set of statements that are not to be considered faulty, displayed underneath the listing of the faulty statements. This allows for a fast cycle of interaction and feedback when analyzing a particular piece of code.
4 Using Java2CSP for Dagnosis In this section, we discuss the mapping of systems comprising hardware components and also deterministic finite automata into an equivalent program representation. The purpose is to show how Java2CSP can be directly used for diagnosis based on MBD, allowing one to carry out experiments without requiring any software installation. We first start with simple systems comprising interconnected components. In Fig. 2, we depict a digital half-adder on the left and the D74 numeric circuit on the right. Because of the fact that we have corresponding Java operations (or set of operations) for all used gates, we can directly map the system to Java programs. For the digital circuit, we map the NAND gates to AND where we attach a NOT. Because of some syntactic limitations of Java2CSP where we only are allowed to have one expression in a statement, we have to first compute the results using the Java “and” operator (&&), store this intermediate value and afterward add another statement for computing the negation using !, finally leading to the following source code:
M1
&
a
a
N2 &
in2
&
N1
in1
N4
*
out
A1 in1
s
b
in2
M2
+
out
f
out
g
in1
b
&
c
N3 &
N5
*
out
A2 in1
d
c
M3
in2
in1
e
(a) Half adder circuit
in2
in2
*
out
(b) D74 circuit
Fig. 2 Two simple circuits comprising basic components and their interactions
3 see
https://github.com/Z3Prover/z3.
+
Java2CSP—A Model-Based Diagnosis Tool Not Only for Software …
525
1. p u b l i c c l a s s H a l f A d d e r { 2. public void b e h a v i o r ( b o o l e a n a , boolean b) { 3. b o o l e a n n1 = a && b ; 4. n1 = ! n1 ; 5. b o o l e a n n2 = a && n1 ; 6. n2 = ! n2 ; 7. b o o l e a n n3 = n1 && b ; 8. n3 = ! n3 ; 9. b o o l e a n s = n2 && n3 ; 10. s = !s; 11. b o o l e a n c = n1 && n1 ; 12. c = !c; } } For computing diagnoses for the half-adder, we have to further specify observations. Obviously, the following given values for inputs and outputs would contradict the expected behavior: a T
b F
s F
c F
Using Java2CSP together with the observations leads to the computation of two single fault diagnoses: {5. boolean n2 = a && n1;}, and {6. n2 = !n2;}, which both map to the component N 2 in the half-adder. The D74 circuit can be similarly mapped to a program. Instead of using Boolean values, we make use of integers. The multiplication and adder functionality can be easily mapped to the Java operators * and + respectively. The resulting source code is as follows: 1. p u b l i c c l a s s D 7 4 C i r c u i t { 2. public void b e h a v i o r ( int a , int b , int c , int d , int e ) { 3. int s1 = a * c ; 4. int s2 = b * d ; 5. int s3 = c * e ; 6. int f = s1 + s2 ; 8. int g = s2 + s3 ; } } For the D74 circuit, the following set of observations is used in the literature for diagnosis computation: a 2
b 3
c 3
d 2
e 2
f 10
g 12
When using these observations, the Java2CSP tool computes (as expected) two single fault diagnoses {3. int s1 = a * c}, {6. int f = s1 + s2}.
526
F. Wotawa and V. A. Dumitru
?
Moving down after the down button was pressed.
Stopping and waiting for a button to pressed for going up or down
Moving up after the up button was pressed.
Do not move down or up when the window is open even if the up or down button is pressed.
Fig. 3 A roller shutter and its requirements
As a last example, we discuss the representation of deterministic finite automata as programs. Let us consider the roller shutter system depicted in Fig. 3. In order to represent the behavior of such a system, we require a deterministic finite automaton comprising vertices representing states, and edges specifying state transitions. For every edge from state s to s in the automaton, we assume that we have an event e. If e is activated, then we can pass from s to s . For the roller shutter, we assume that after starting the system, the state is up. When pressing the down button, the shutter will go down until reaching the bottom. Afterward, we can go up again using the up button. Note that we can stop the operation at any time. Moreover, we assume a sensor indicating the status of a window, which has to be closed for operating the roller shutter when going down to prevent damage. In Fig. 4, we depict the whole deterministic finite automaton for such a system. Deterministic finite automata can be easily converted into a program when only considering the transitions between states. In particular, we come up with a function transition that takes the old state of the automaton and the events as input, and computes the new state. We are capturing the transitions using a nested conditional statement. Note that in case the automaton delivers an output, we map this to a variable in the program. 1. p u b l i c c l a s s R o l l e r S h u t t e r { 2. public void t r a n s i t i o n ( int state , b o o l e a n up , b o o l e a n down , b o o l e a n closed , boolean top_sensor , boolean bottom_sensor ) { 3. boolean n_closed = ! closed ;
Java2CSP—A Model-Based Diagnosis Tool Not Only for Software …
Fig. 4 A deterministic finite automaton describing the behavior of a roller shutter
4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29.
int n e w S t a t e = -1; if ( s t a t e == 0) { n e w S t a t e = 1; } else { if ( state == 1) { if ( down && closed ) { n e w S t a t e = 2; } } else { if ( state == 2) { if ( up || n _ c l o s e d ) { n e w S t a t e = 3; } else { if ( b o t t o m _ s e n s o r ) { n e w S t a t e = 4; } } } else { if ( state == 3) { if ( down && closed ) { n e w S t a t e = 2; } else { if ( up ) { n e w S t a t e = 5; } } } else { if ( state == 4) { if ( up ) { n e w S t a t e = 5; } } else { if ( state == 5) { if ( t o p _ s e n s o r ) { n e w S t a t e = 1; } else { if ( down ) { n e w S t a t e = 3; } } } } } } } } } }
527
528
F. Wotawa and V. A. Dumitru
We may use the program comprising the finite automaton, to detect a fault in the state transitions. For this purpose, we assume that Line 21 is newState = 2;. When calling transition with state = 3, down = false, closed = true, bottom_sensor = false, top_sensor = false, and up=true, we expect newState = 5 but receive newState = 2 instead. If using the faulty program with the bug introduced Java2CSP returns only one single fault diagnosis, i.e.: {21. newState = 2}. This result is exactly pinpointing the introduced fault.
5 Conclusions In this paper, we described the Java2CSP tool that maps Java-like programs into their equivalent constraint form. We discuss the underlying foundations, provide references to previous work, show how the tool operates, and introduce use cases where Java2CSP was used to diagnose circuits and the transition function of finite state automata. The use cases provide information of how to use Java2CSP in a more ordinary model-based diagnosis setup where we are interested in finding faults in hardware. The Java2CSP tool has several limitations including syntactical restrictions. However, it was developed for providing an implementation more for educational use. The tool is available on the web without requiring installing a local copy. Future research includes improving the tool and also making use of the tool for other purposes like test case generation. Acknowledgements The financial support by the Austrian Federal Ministry for Digital and Economic Affairs, the National Foundation for Research, Technology and Development and the Christian Doppler Research Association is gratefully acknowledged.
References 1. Brandis, M.M., Mössenböck, H.: Single-pass generation of static assignment form for structured languages. ACM TOPLAS 16(6), 1684–1698 (1994) 2. Console, L., Friedrich, G., Theseider Dupré, D.: Model-based diagnosis meets error diagnosis in logic programs. In: Proceedings 13th International Joint Conference on Artificial Intelligence. pp. 1494–1499. Chambery (1993) 3. Davis, R., Shrobe, H., Hamscher, W., Wieckert, K., Shirley, M., Polit, S.: Diagnosis based on structure and function. In: Proceedings AAAI. pp. 137–142. Pittsburgh (1982) 4. Friedrich, G., Stumptner, M., Wotawa, F.: Model-based diagnosis of hardware designs. Artif. Intell. 111(2), 3–39 (1999) 5. de Kleer, J., Williams, B.C.: Diagnosing multiple faults. Artif. Intell. 32(1), 97–130 (1987) 6. Malik, A., Struss, P., Sachenbacher, M.: Case studies in model-based diagnosis and fault analysis of car-subsystems. In: Proceedings of the European Conference on Artificial Intelligence (ECAI) (1996) 7. Milde, H., Guckenbiehl, T., Malik, A., Neumann, B., Struss, P.: Integrating model-based diagnosis techniques into current work processes—three case studies from the INDIA project. AI Commun. 13 (2000) (special Issue on Industrial Applications of Model-Based Reasoning)
Java2CSP—A Model-Based Diagnosis Tool Not Only for Software …
529
8. Rajan, K., Bernard, D., Dorais, G., Gamble, E., Kanefsky, B., Kurien, J., Millar, W., Muscettola, N., Nayak, P., Rouquette, N., Smith, B., Taylor, W., Tung, Y.: Remote agent: an autonomous control system for the new millennium. In: Proceedings of the 14th European Conference on Artificial Intelligence (ECAI). Berlin, Germany (2000) 9. Reiter, R.: A theory of diagnosis from first principles. Artif. Intell. 32(1), 57–95 (1987) 10. Sachenbacher, M., Struss, P., Carlén, C.M.: A prototype for model-based on-board diagnosis of automotive systems. AI Commun. 13 (2000). (Special Issue on Industrial Applications of Model-Based Reasoning) 11. Wotawa, F., Nica, M., Moraru, I.: Automated debugging based on a constraint model of the program and a test case. J. Log. Algebraic Methods Program. 81(4), 390–407 (2012). https:// doi.org/10.1016/j.jlap.2012.03.002
Meta-diagnosis via Preference Relaxation for State Trackability Xavier Pucel, Stéphanie Roussel, Louise Travé-Massuyès, and Valentin Bouziat
Abstract In autonomous systems, planning and decision making rely on the estimation of the system state across time, i.e., state tracking. In this work, a preference model is used to provide non-ambiguous estimates at each time point. However, this strategy can lead to dead ends. Our goal is to anticipate dead ends at design time and to blame root cause preferences, so that these preferences can be revised. To do so, we present the preference-based state estimation approach and we apply a consistency-based meta-diagnosis strategy based on preference relaxation. We evaluate our approach on a robotic functional architecture benchmark.
1 Introduction For autonomous systems, state tracking is a critical task because it strongly influences decision making, which is essential to the life of the system. It provides the means to diagnose faults and to react to the various hazards that can affect the system. In this paper, we focus on discrete event systems [17] in which states are Boolean variable assignments and transitions are propositional logic formulae. When the system state is partially observable, the number of candidate state estimates may quickly become too large to be usable. In embedded or distributed systems, memory X. Pucel (B) · S. Roussel · V. Bouziat ONERA/DTIS, University of Toulouse, Toulouse, France e-mail: [email protected] S. Roussel e-mail: [email protected] V. Bouziat e-mail: [email protected] L. Travé-Massuyès LAAS-CNRS, University of Toulouse, CNRS, Toulouse, France e-mail: [email protected] X. Pucel · L. Travé-Massuyès ANITI, University of Toulouse, Toulouse, France © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_44
531
532
X. Pucel et al.
and communication limitations may also become a problem, even with symbolic representation techniques such as in [15]. These limitations lead us to propose a single-state estimation strategy that retains only one state out of the set of candidate estimates at each time step, as in [2]. Although this can be seen as an extreme strategy, it is efficient to feed the decision system with a clear input and it is consistent with main stream works that select a limited number of best candidates according to some preference criterion, for example probabilities [9, 16]. However, the more we limit the number of estimates, the more we may be confronted with the problem of dead ends, i.e., observations proving that previous estimates were wrong and evidencing no continuation of the estimated trajectory. Because backtracking [9] is not a viable solution in regard to real-time constraints, some approaches [4, 14] build single-state estimators that are guaranteed to avoid dead ends, making the system single-state trackable. However, this is not always possible. In this case, there are two options: to refer to multi-estimator strategies as proposed in [5] or to detect and diagnose dead ends at design time and to identify which part(s) of the estimator should be modified to circumvent the dead end. In this work, we focus on the second option that can be qualified as a metadiagnosis strategy. Following the approach of [2, 11, 12], a single-state estimator is defined by a totally ordered conditional preference model [1], which allows to describe conditions for estimating truth values of state variables. Diagnosing a dead end means identifying a set of preferences that, if modified appropriately, circumvent the dead end. The approach implements a consistency-based diagnosis approach based on preference relaxation. This paper resumes the work of a short two-pages paper presented as a poster in [3]. It provides the details of the meta-diagnosis approach and the results of a set of experiments used in the validation phase. The paper is structured as follows. In Sect. 2, we formally define the process of incremental single-state estimation and dead ends. In Sect. 3, we define the semantic for relaxing conditional preferences and describe a meta-diagnosis strategy to circumvent dead ends based on relaxing preferences. In Sect. 4, we present a robotic functional architecture used to test our approach and provide the results in Sect. 5. We then conclude and discuss future work in Sect. 6.
2 Incremental Single State Estimation Preliminary notations. For a variable set V, an assignment over V is a function that assigns a truth value (true, false) to each variable of V. An assignment over V is classically extended to assign a value to formulae whose scope is in V. For a variable set V and a variable v ∈ V, v (resp. v) denotes the assignment in which v is assigned true (resp. false). For two assignments x and y on the variable sets X and Y, x.y is the assignment on X ∪ Y such that ∀x ∈ X, xy(x) = x(x) and ∀y ∈ Y, xy(y) = y(y). For two sets of variables X and Y such that Y ⊆ X, and an assignment x on X, the projection of x on Y is denoted x ↓Y such that ∀y ∈ Y, x ↓Y (y) = y(y).
Meta-diagnosis via Preference Relaxation for State Trackability
533
2.1 System Definition In the following, we adapt definitions of previous works [2, 12]. We suppose that the system is governed by discrete dynamics where each time step lasts the same duration. Definition 1 (System) A system M is a tuple (O, E, S, s0 , ) where: • 0 and E are two finite sets of propositional symbols that represent, respectively, observable and estimated features of the system; • S is a subset of assignments on 0 ∪ E and represents the states of the system; • s0 ∈ S is the initial state of the system; • ⊆ S × S is the set of transitions of the system. Without loss of generality, we consider that all the states of the system are reachable from the initial state s0 . In order to avoid an explicit representation of transitions, is represented by a set of propositional logic formulae p that can relate to both the variables of 0 ∪ E, here denoted P and the same variables but referring to the previous time step, denoted Ppre . Formally, we define a bijection pre : P → Ppre such that for all p ∈ P, pre(p) refers to the variable p of the state at the previous time step. For a state s ∈ S, spre is the assignment on Ppre such that ∀p ∈ P, spre (pre(p)) = s(p). The set of transitions is the set {(s, t) ∈ S 2 |spre .t ( p ) = true}. An observation is a projection of a state on observable symbols O. The set of observations of a system is denoted O. We define the function cands : S × O → 2S such that for all states s in S, for all o in O, cands(s, o) represents the set of successors of s that have observation o. Formally, ∀s ∈ S, ∀o ∈ O, cands(s, o) = {t ∈ S|(s, t) ∈ and t ↓O = o}. We also define nextObs the function S → 2 O such that nextObs(s) is the set of observations that can be observed just after the system is in state s. Formally, ∀s ∈ S, nextObs(s) = {o ∈ O|cands(s, o) = ∅}. A state sequence seq is a list (s0 , s1 , . . . , sn−1 ) where each si is a state in S; |seq| = n is the length of the sequence and seq[i] = si is the ith state in the sequence; last(seq) designates the last state of seq; if s is a state, seq · s is the sequence of length |seq| + 1 that begins with seq and ends with s. Definition 2 (Language, observation language) The language associated with a system M = (O, E, S, s0 , ) is the set of state sequences accepted by the system and starting with s0 . Formally L(M) = {seq ∈ S + |seq[0] = s0 and ∀i ∈ [1, |seq| − 1], (seq[i − 1], seq[i]) ∈ }. The observation language is the language accepted by the system projected on the observations. Formally, Lobs (M) = {seq↓O |seq ∈ L(M)}. We illustrate this modeling approach on a small model that is a simplified version of the experimental benchmark of Sect. 4. Example 1 (System) We consider a simple robot functional architecture with three functions: movement, communication, and power supply. The health status of
534
X. Pucel et al.
each function is represented by the three respective variables hmv , hcom , and hpow . Two alarms almv and alcom can be raised when movement and communication fail, respectively. We estimate the value of health variables from the sequence of alarms, i.e., O = {almv , alcom }, and E = {hmv , hcom , hpow }. The initial state is s0 = almv .alcom .hmv .hcom .hpow . is the conjunction of the following formulae : ¬pre(hpow ) ¬hpow ¬hmv ¬hcom
→ ¬hpow → (almv ∧ alcom ) → almv → alcom
(δ1 ) (δ2 ) (δ3 ) (δ4 )
It expresses that the fault in the power supply is permanent (δ1 ), and causes both alarms to be raised (δ2 ), movement faults cause movement alarms (δ3 ), and communication faults cause communication alarms (δ4 ). Note that alarms can also occur without any fault being present (false positives, external perturbations, etc.), and that movement and communication faults are intermittent, i.e., they are independent of their previous values. At time step 1, let us assume we receive observation o1 = almv .alcom . We have cands(s0 , o1 ) = {almv .alcom .hmv .hcom .hpow , almv .alcom .hmv .hcom .hpow }. In the first candidate state, the almv alarm is explained as a false alarm or caused by some unknown event, while it is explained by a fault in the move module in the second candidate.
2.2 Preference-Based Estimation Strategy We now focus on the estimation part for systems defined above. Since we consider non-deterministic, partially observable systems, there may be several state sequences that explain a given observation sequence. We adopt an incremental approach to select a unique explanation, called an estimation strategy [14]. Definition 3 (Estimation strategy) An incremental single-state estimation strategy for a system M is a function estim : S × O → S such that for all s in S, for all o in nextObs(s), estim(s, o) represents the estimated state of the system at time step k if it was estimated in state s at time step k − 1 and if o is observed at time step k. We impose the estimation strategy to be consistent both across time (i.e., estim is a function) and with the system behavior (i.e., estim(s, o) belongs to cands(s, o)). Following [2, 12], the estimation strategy is based on a sequence of conditional preferences. Definition 4 (Conditional preference) Let M be a system. A conditional preference γ on a variable e of E is defined by cond : e ≺ e , where cond is a propositional formula on P ∪ Ppre . cond is called γ ’s condition and e is γ ’s target.
Meta-diagnosis via Preference Relaxation for State Trackability
535
Informally, cond : e ≺ e expresses the fact that we prefer to estimate e as true if and only if cond is true. Note that it is equivalent to ¬cond : e ≺ e . From now on, we consider that the set of estimated variables E contains n variables and that these variables are ordered, from e1 to en . This order can be seen as the order used to estimate the state at every time step. Definition 5 (Conditional preference model) A conditional preference model for a system M is an ordered sequence of preferences (γ1 , γ2 , . . . , γn ) such that γi = condi : ei ≺ ei is a conditional preference on ei and that its condition condi only uses variables from Ppre ∪ O ∪ {ej |1 ≤ j < i}. Informally, if γi appears before γ j in , then the condition of γ j can depend on the outcome of γi , but the reverse is forbidden. Algorithm 1: PreferredEstimation(M, sk−1 , ok ) Returns a singleton containing the preferred state estim(sk−1 , ok ) preferredCands ← cands(sk−1 , ok ); estTargets ← true ; for i ← 1 : n do val ← sk−1 pre .ok .estTargets(condi ) ; if ∃t ∈ preferredCands such that t (ei ) = val then estTargets ← estTargets.[ei ← val] ; preferredCands ← preferredCands − {t|t (ei ) = val} 7 else estTargets ← estTargets.[ei ← val] 1 2 3 4 5 6
8
return preferredCands ;
A conditional preference model allows to define an estimation strategy estim as presented in Algorithm 1. Let s ∈ S be the state of the system at time step k − 1 and o ∈ nextObs(s) the observation received at time step k. We initialize the sets of preferred candidates for estim(sk−1 , ok ) with cands(sk−1 , ok ) (line 1), and the general idea is to remove non-preferred candidates from this set until it is a singleton. To do so, for each preference γi , we compute the value val of condi with the assignment sk−1 pre .ok .estTargets where estTargets is the assignment containing truth values for variables ej with j < i (line 3). If there exists a state t in preferredCands such that t (ei ) = val, we apply the preference and consider that estim(sk−1 , ok) = val (line 5) and we remove from preferredCands all states that do not have value val for ei (line 6). If such a t does not exist, it means there is no choice for this preference, i.e., only the negation of val is possible for ei in preferred states (line 7). Note that such an algorithm corresponds to the best transition with respect to a partial order on transitions formally defined in [2]. Example 2 (Preference model) We consider the system of Example 1 and now define the conditional preference model:
536
X. Pucel et al.
¬pre(almv ) ∧ almv ∧ ¬pre(alcom ) ∧ alcom : hpow ≺ hpow (γ1 ) almv ∧ hpow : hmv ≺ hmv
(γ2 )
pre(alcom ) ∧ alcom ∧ hpow : hcom ≺ hcom (γ3 ) If both alarms are raised simultaneously, we blame their common cause, i.e., the power supply (γ1 ). Otherwise we blame the respective functions (γ2 and γ3 ). Moreover, for the communication function, we dismiss the first alarm as noise and only diagnose a communication fault when the alarm persists during several time step (γ3 ). Thus, we have estim(s0 , o1 ) = almv .alcom .hmv .hcom .hpow . Definition 6 (Estimated sequence) Let M be a system, seqobs be an observation sequence in Lobs (M) and estim an estimation strategy for M based on a preference ∈ L(M) such model . The estimated sequence for seqobs is the state sequence seq − 1], seq[i] = estim(seq[i − 1], seqobs [i]). that seq[0] = s0 and for all i in [1, |seq|
2.3 Dead End At some point, the estimator may choose a state sequence different from the one actually taken by the system. This may be an issue when the following conditions happen: a) the system is in state s, the estimator estimates that it is in state sˆ with sˆ = s; b) the system moves to state t and produces observation t ↓O ; c) the set of candidates cands(ˆs , t ↓O ) is empty. Definition 7 (Dead end) Let M be a system and estim an estimation strategy based on a preference model . A sequence of observations seqobs · o in Lobs (M) is a dead o) = ∅. end if there exists an estimated sequence seq for seqobs and cands(last(seq), Example 3 (Dead end) In the system and the estimation strategy presented in Examples 1 and 2, the sequence of observations (almv .alcom , almv .alcom , almv .alcom ) is a dead end. In fact, at time step 1, two faults occur simultaneously in the movement and communication functions in the system, causing both alarms to activate. The estimator explains it with a fault in the battery, due to preference (γ1 ), which constitutes a divergence between the real system and the diagnosis. At time step 2, the faults disappear in the real system, causing both alarms to stop and the estimator cannot explain this observation as the fault in the battery is permanent.
3 Meta-Diagnosis via Consistency-Based Diagnosis of Preferences While dead ends can be eliminated by modifying the system, we aim at eliminating them by modifying the estimation strategy defined by the the preference model.
Meta-diagnosis via Preference Relaxation for State Trackability
537
Indeed, some dead ends can be circumvented by modifying the preference conditions in . We address this problem with a consistency-based diagnosis approach [7]. More precisely, given a dead end and a preference model, we want to know whether the observation sequence associated with the dead end could be accepted by the estimator if some preferences that we aim to identify were “relaxed.” When this is the case, we can indicate to the designer that the dead end can be avoided by modifying the conditions for the identified subset of preferences. Correcting the preference conditions is left for future work.
3.1 Relaxed Preference Model In this subsection, we generalize the notion of preference by introducing relaxed preferences. Then, we define the notion of (general) preference, allowing us to define a relaxed preference model. A relaxed preference targeting the variable e in E declares that the valuations of e are incomparable, i.e., there is no preference on the valuations of e in any context. Definition 8 (Relaxed preference model) A relaxed preference for e in E has the form e e . A (general) preference for e ∈ E, denoted ϕ, is either a conditional preference cond : e ≺ e or a relaxed preference e e . Given a system M, a conditional preference model = (γ1 , γ2 , . . . , γn ) and a subset of preferences ⊆ , a relaxed preference model is a sequence of preferences (ϕ1 , ϕ2 , . . . , ϕn ) such / , and ϕi = ei ei otherwise. that ϕi = γi if γi ∈ In comparison with the conditional preference model presented previously, a relaxed preference model may result in more than one preferred state in the set of candidates at each time step. Preferred states only differ with respect to variables that are target of relaxed preferences. In Algorithm 1, two lines should be modified to compute preferred states in that case. First, at line 2, the condition of the loop should be “for all i ∈ [1, n] such that φi is a conditional preference.” This implies that candidates states can only be removed by conditional preferences. Then, as variables ej that are target of relaxed preferences are not assigned a truth value in estTargets, line 3 should be replaced by “val ← isSAT(sk−1 pre .ok .estTargets, condi ),” where for an assignment x over variables X, and form a formula whose scope is F, isSAT(x, form) returns true if and only there exists an assignment y over variables X ∪ F such that y ↓X = x and y(form) is true. Therefore, val is true if and only if it is possible to assign targets of relaxed preferences to make condition condi true. Finally, the returned preferred candidates may be several as some steps are skipped in the loop. Note that, following [2], it would be possible to formally define the new order on transitions defined by a relaxed preference model, but we do not present it here for conciseness purposes.
538
X. Pucel et al.
3.2 Relaxed Estimation Process We now generalize the notions of estimation strategy, estimation sequence, and dead end. Informally, a relaxed estimation strategy returns a set of preferred candidates instead of a unique preferred candidate. A relaxed estimated sequence is a sequence of states in which any successive states s and t in the sequence are such that t belongs to the set of preferred states of the relaxed estimation strategy for state s. A relaxed dead end is a sequence of observations for which the last observation cannot be explained by any relaxed estimation sequence. Definition 9 (Relaxed estimation strategy) A relaxed estimation strategy for a sysS tem M and a relaxed preference model is a function estim : S × O → 2 such that for all s in S, for all o in nextObs(s), estim (s, o) represents the set of preferred estimated states of the system at time step k if it was estimated in state s at time step k − 1 and if o is observed at time step k. Definition 10 (Relaxed estimation sequence) Let M be a system, seqobs be an observation sequence in Lobs (M) and estim a relaxed estimation strategy for M. A relaxed ∈ L(M) such that seq[0] = s0 estimation sequence for seqobs is a state sequence seq and for all i in [1, |seq| − 1], seq[i] ∈ estim ( seq[i − 1], seq [i]). obs Definition 11 (Relaxed dead end) Let M be a system and estim a relaxed estimation strategy. A relaxed dead-end is a sequence of observations seqobs · o in Lobs (M) such o) = ∅. that for all relaxed estimated sequences seq for seqobs , cands(last(seq), Intuitively, relaxing preferences allow to increase the set of estimated sequences and therefore might reduce the number of relaxed dead ends. This is expressed through the following proposition. 1 Proposition 1 (Relaxed dead-end inclusion) Let M be a system and estim and 2 estim two relaxed estimation strategies such that 1 ⊆ 2 . If seqobs is a relaxed 1 2 dead end for estim then it is also a relaxed dead end for estim .
Proof We first show that for a state s in S and an observation o in nextObs(s), 2 1 estim (s, o) ⊆ estim (s, o), i.e., preferred states are still preferred after relaxing preferences. Then, we can show that a relaxed estimation sequence for seqobs in 2 1 estim is also a relaxed estimation sequence for seqobs in estim . It follows that if 2 all relaxed estimation sequences for seqobs in estim cannot be followed by o, so it 1 is for all relaxed estimation sequences for seqobs in estim . Example 4 Let us consider the dead end from Example 3: seqobs = (almv .alcom , almv .alcom , almv .alcom ), and let us relax two preferences = {γ1 , γ2 }. The sequence of states (s0 , sˆ 1, sˆ 2) with sˆ 1 = almv .alcom .hpow .hmv .hcom and sˆ 2 = almv .alcom .hpow .hmv . hcom is a relaxed estimation sequence for seqobs , which means that seqobs is not a relaxed dead end for estim .
Meta-diagnosis via Preference Relaxation for State Trackability
539
3.3 Consistency-Based Preference Diagnosis Checking whether a given observation sequence is a relaxed, dead end can be considered as a form of consistency check (due to Proposition 1). Then, searching for the smallest set(s) of preferences that circumvent a dead end can be done with a classical consistency-based diagnosis algorithm [13]. Definition 12 (Preference Meta-diagnosis) Let M be a system, estim an estimation strategy based on a conditional preference model , and seqobs a dead end for this estimation strategy. A set of preferences ⊆ is a preference meta-diagnosis if seqobs is not a relaxed dead end for estim . A meta-diagnosis is a minimal meta-diagnosis if and only if there is no metadiagnosis ⊆ such that ⊂ . A meta-diagnosis is interpreted as follows: it is possible to modify the conditions for the preferences in so that the diagnoser does not dead end on the associated observation sequence. It does not guarantee anything with respect to other potential dead ends. Proposition 1 ensures that if is a meta-diagnosis, then all supersets of are meta-diagnoses as well. Example 5 (Minimal meta-diagnosis) In example 4, we have seen that seqobs was not a relaxed dead end for estim . This means that = {γ1 , γ2 } is a meta-diagnosis for this dead end. The minimal meta-diagnosis for seqobs is in fact 2 = {γ1 }. Thus, modifying the condition of γ1 , can eliminate dead end seqobs , for example by replacing γ1 with γ1 = : hpow ≺ hpow . However it implements a different fault management strategy, that must be validated against the robotic mission requirements. To check if a sequence of observations seqobs is a relaxed dead end for a given set of relaxed preferences, it is possible to compute at each time step the set of preferred candidates. To do so, we follow the modification of Algorithm 1 described previously in this section. Then, starting from the initial state, it is possible to compute all relaxed estimation sequences and therefore compute whether seqobs is a relaxed dead end. Note that this approach is combinatorial in both the number of relaxed preferences and the length of seqobs . By testing the meta-diagnosis candidates, one can find all the minimal preferences meta-diagnoses. Approaches such as the Fastdiag algorithm [6] can be used to efficiently browse the meta-diagnosis candidate space.
4 Experiments We have experimented our approach on a functional robotic architecture along with a complex dynamic and complex preference model. We consider a system with three functions: movement, communication, and power supply. It can raise two alarms almv and alcom (observations of the system) if they are performing poorly. We model
540
X. Pucel et al.
the trust we have in each function with variables tmv , tcom and tpow , in the sense that as we receive alarms, we lose trust in the system’s operational capacity. Variables fmv and fcom model external perturbations that impede movement and communication (obstacles, slippery terrain, distance to antenna, etc.). fpow represents a loss of voltage in the power supply. The estimator receives alarms as input, and estimates if each function can be trusted for autonomous operation. Figure 1 details the model and illustrates this architecture. represents that we trust the movement and communication functions only if we trust the power supply function as well (δ1 ). Movement (resp. communication) perturbations or low voltage cause a movement (resp. communication) alarm (δ2 , δ3 ). Once we have lost trust in the power supply, we never trust it again (δ4 ). For communication, the alarm is a perfect indicator of our trust in the function (δ5 ). In the preferences associated with this model, we use the temporal logic operators from PtLTL to express formula compactly. We rely on [8] to efficiently translate PtLTL formulae into propositional logic formulae. The formula up(f) is true when the formula f was false at the previous time step and is now true. The formula Hκ (f) (with κ > 0) is true when the formula f has been true for the last κ time steps, including now. The formula O(f) is true when f has been true at least once in the experiment, including now. is the ordered sequence (γ1 , γ2 , γ3 , γ4 , γ5 , γ6 ) and implements the following strategy. We blame low voltage if and only if both alarms fire simultaneously (γ1 ), or if low voltage was already blamed at the previous time step. After a continuous period of size κ with low voltage, we lose trust in the power supply1 (γ2 ). When a movement alarm not explained by low voltage is on for two time steps, we blame the associated environmental perturbations (γ3 ). After four time steps without alarm, we trust the (tmv ∨ tcom ) → tpow
(δ1 )
pre(fpow ) ∨ (up(almv ) ∧ up(alcom )) : fpow ≺ fpow
(γ1 )
O(Hκ (fpow )) : tpow ≺ tpow
(γ2 )
(fmv ∨ fpow ) → almv
(δ2 )
(fcom ∨ fpow ) → alcom
(δ3 )
¬pre(tpow ) → ¬tpow
H2 (almv ) ∧ ¬fpow : fmv ≺ fmv
(γ3 )
(δ4 )
H4 (¬almv ) : tmv ≺ tmv
(γ4 )
tcom ↔ ¬alcom
(δ5 )
alcom ∧ ¬fpow : fcom ≺ fcom (γ5 ) : tcom ≺ tcom (γ6 )
−fmv −tmv +almv
Move
−fpow −tpow Power
Com
−fcom −tcom +alcom
Fig. 1 , and schema of the simple architecture model. Variables labeled with + are observable, − estimated. Arrows represent functional dependency constraint (δ4 ), preference (γ2 ) could equivalently be written as Hκ (fpow )) : tpow ≺ tpow . However, we claim that should represent the estimation strategy independently from the system model, for modularity purposes.
1 Given
Meta-diagnosis via Preference Relaxation for State Trackability
541
Table 1 Meta-diagnosis computation time (in seconds) against dead-end length κ Dead-end len. Comp. time (s) κ Dead-end len. Comp. time (s) 3 4 5 6 7
5 6 7 8 9
0.56 0.84 1.52 3.03 6.07
8 9 10 11 12
10 11 12 13 14
11.57 23.48 47.72 94.26 186.93
movement function (γ4 ). Communication alarms not explained by low voltage are blamed on environmental perturbations (γ5 ). In doubt, we trust the communication function (γ6 ). This model has a dead end when both movement and communication alarms are raised simultaneously, stay on for κ time steps, then alcom turns off. We can control the minimal length of the dead ends with the κ parameter in (γ2 ), as the shortest dead end has κ + 2 time steps. There are two minimal meta-diagnoses for this dead end: {γ1 } and {γ2 }. If we replace γ1 by γ1 = : fpow ≺ fpow or γ2 by γ2 = : tpow ≺ tpow , the dead end is circumvented, although it changes the estimation strategy.
5 Results We use Sat4j [10] as a SAT solver to directly compute if there exists a candidate with a particular variable value (see Algorithm 1 line 5), and to implement the isSAT function described in Sect. 3.1. Experiments have been conducted on an Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz processor with 62GiB of RAM, although only around 2Gib were used. The results depicted in Table 1 show that the computation is very fast for short dead ends, but grows exponentially with dead-end length. This is due to the fact that at each time step, for each relaxed preference, there may be two outcomes, which means that in the worst case we need to explore an exponential number of paths to check if a path is a relaxed dead end. We consider the performance satisfactory for two reasons. First, memory usage is not a limiting factor, and time is not a constraint during design. Second, long dead ends are difficult to find, but also difficult to interpret and debug by the designer. When an estimation model becomes too large for maintenance, architectural responses may help dividing it in smaller decentralized models.
542
X. Pucel et al.
6 Conclusion In this paper, we present an approach for blaming a dead end on a set of preferences at design time. It follows a consistency-based meta-diagnosis strategy based on relaxing conditional preferences. We have defined and implemented algorithms with satisfactory performances. For large benchmarks, several approaches can be explored to improve the associated computation time. For instance, we could parallelize the algorithms by dividing the space search among several computation cores. We could also define an intelligent heuristic for finding relevant scenarios faster. During our experiments we noted that many dead ends reproduce the same pattern. A perspective is to identify dead-end patterns to represent them more compactly. Another perspective is to find relaxations that circumvent several dead ends at once. A final perspective is the correction of conditions of preferences belonging to a meta-diagnosis to avoid a dead end or a set of dead ends. Acknowledgements This project has been supported by ANITI, the “Artificial and Natural Intelligence Toulouse Institute,” through the French “Investing for the Future—PIA3” program under the Grant agreement ANR- 19-PI3A-0004.
References 1. Boutilier, C., Brafman, R.I., Domshlak, C., Hoos, H.H., Poole, D.: Preference-based constrained optimization with CP-nets. In: Computational Intelligence, pp. 137–157 (2004) 2. Bouziat, V., Pucel, X., Roussel, S., Travé-Massuyès, L.: Preferential discrete model-based diagnosis for intermittent and permanent faults. In: Proceedings of the 29th International Workshop on Principles of Diagnosis (DX’18) (2018) 3. Bouziat, V., Pucel, X., Roussel, S., Travé-Massuyès, L.: Preference-based fault estimation in autonomous robots: Incompleteness and meta-diagnosis. In: Elkind, E., Veloso, M., Agmon, N., Taylor, M.E. (eds) Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS ’19, Montreal, QC, Canada, May 13–17, 2019, pp. 1841– 1843. International Foundation for Autonomous Agents and Multiagent Systems (2019). http:// dl.acm.org/citation.cfm?id=3331937 4. Bouziat, V., Pucel, X., Roussel, S., Travé-Massuyès, L.: Single state trackability of discrete event systems. In: Proceedings of the 30th International Workshop on Principles of Diagnosis (DX’19) (2019) 5. Coquand, C., Pucel, X., Roussel, S., Travé-Massuyès, L.: Dead-end free single state multiestimators for DES -the 2-estimator case. In: 31st International Workshop on Principles of Diagnosis (DX-2020). Nashville, Tennessee, United States (2020). https://hal.laas.fr/hal-03089427 6. Felfernig, A., Schubert, M., Zehentner, C.: An efficient diagnosis algorithm for inconsistent constraint sets. Artif. Intell. Eng. Des. Anal. Manuf.: AI EDAM 26(1), 53 (2012) 7. Hamscher, W., et al.: Readings in model-based diagnosis (1992) 8. Havelund, K., Rosu, G.: Synthesizing monitors for safety properties. In: Proceedings of the 8th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS-02), pp. 342–356 (2002) 9. Kurien, J., Nayak, P.P.: Back to the future for consistency-based trajectory tracking. In: AAAI/IAAI, pp. 370–377 (2000) 10. Le Berre, D., Parrain, A.: The Sat4j library, release 2.2. JSAT 7(2-3), 59–6 (2010). https:// satassociation.org/jsat/index.php/jsat/article/view/82
Meta-diagnosis via Preference Relaxation for State Trackability
543
11. Pralet, C., Pucel, X., Roussel, S.: Diagnosis of intermittent faults with conditional preferences. In: Proceedings of the 27th International Workshop on Principles of Diagnosis (DX’16) (2016) 12. Pucel, X., Roussel, S.: Intermittent fault diagnosis as discrete signal estimation: trackability analysis. In: 28th International Workshop on Principles of Diagnosis (DX’17) (2017) 13. Reiter, R.: A theory of diagnosis from first principles. Artif. Intell. 32(1), 57–95 (1987) 14. Roussel, S., Pucel, X., Bouziat, V., Travé-Massuyès, L.: Model-based synthesis of incremental and correct estimators for discrete event systems. In: Bessiere, C. (ed.) Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, pp. 1884– 1890. ijcai.org (2020). https://doi.org/10.24963/ijcai.2020/261 15. Torta, G., Torasso, P.: An on-line approach to the computation and presentation of preferred diagnoses for dynamic systems. AI Commun. 20(2), 93–116 (2007) 16. Williams, B.C., Nayak, P.P.: A model-based approach to reactive self-configuring systems. In: Proceedings of the 13th AAAI Conference on Artificial Intelligence, pp. 971–978 (1996) 17. Zaytoon, J., Lafortune, S.: Overview of fault diagnosis methods for discrete event systems. Annu. Rev. Control. 37(2), 308–320 (2013)
Model-Based Diagnosis of Time Shift Failures in Discrete Event Systems: A (Max,+) Observer-Based Approach Claire Paya, Euriell Le Corronc, Yannick Pencolé, and Philippe Vialletelle
Abstract This paper addresses the problem of diagnosing the occurrence of time shift failures in systems like automated production lines. The model of the system is represented as a timed event graph (TEG) that is characterized as a (max,+)-linear system. The proposed method aims at detecting and localizing the source of time shift failures by the design of a set of indicators. These indicators rely on the residuation theory on (max,+)-linear systems and a (max,+) observer that estimates the internal state of the observed system. Keywords Model-based diagnosis · (Max,+)-linear system
1 Introduction Discrete event systems (DES) can be used to model and solve fault diagnosis problems in automated production lines. In systems like production lines, failures can be not only caused by complete equipment breakdowns but also by the occurrence of time shifts so that the production line can dramatically slow down and not be able to comply with the specified production objectives. This paper addresses the problem of how to automatically detect and localize the source of such time shifts based on C. Paya (B) · P. Vialletelle STMicroelectronics, Crolles, France e-mail: [email protected] P. Vialletelle e-mail: [email protected] C. Paya · E. Le Corronc · Y. Pencolé LAAS-CNRS, University Toulouse, CNRS, Toulouse, France e-mail: [email protected] Y. Pencolé e-mail: [email protected] C. Paya · E. Le Corronc UniversitÃl’ Toulouse 3, Paul Sabatier, Toulouse, France © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_45
545
546
C. Paya et al.
a subclass of time Petri Nets, called timed event graph (TEG). In TEGs, places are associated with a punctual duration and they can be modeled by (max,+) algebra as introduced in [1, 6]. The history of DES with the use of (max,+) algebra is presented in [5]. For example, [4] uses (max,+) algebra to control wafer delays in cluster tools for semiconductor production. The problem of failure diagnosis by the use of (max,+) algebra has been introduced in [7] where the proposed detection method relies on the residuation theory and compares observable outputs with expected ones to detect output time shifts (Sect. 3). Failure localisation is then performed by an ad hoc structural analysis of the underlying TEG that does not use (max,+) algebra. The objective of this paper is to design a new set of time shift failure indicators (Sect. 5) that are not based on the observable outputs of the system only but on the estimation of the internal state of the system (Sect. 4) so that the failure localisation problem is also solved in an algebraic way. To do so, the proposed failure indicator will rely on an observer that is proposed in [2] and aims at rebuilding system states based on the observations.
2 Motivation The problem that we address is motivated by a real production line that is at STMicroelectronics Crolles300 plant. Semiconductor manufacturing is complex, and one of its most important challenges is to succeed in detecting production drifts before they have real impact on production plan. STMicroelectronics has complex production lines of wafer batches with many pieces of equipment running in parallel. One of the objectives is to detect as soon as possible that an equipment is late to ensure that products (wafer batches) are delivered on time or at least with minimal delays. Figure 1 presents a fault-free behavioral model of such a production line defined by a p3 Eq1
p12
0
p1 u1
1
y2
p4 x1
3 p6
p9
Eq3
p7
x2
7
0
6
p10
2 Eq2
p11
0 x5
p8 p2
1
x6
0
y1
4 3
p5 2
1 u2
4
u1 = u 2
5
x3
4
x4
p13
1 γ y3
1
2
3
4
5
6
Fig. 1 Fault free model (left) and a graphical representation of the inputs u1 (=u2 ) in the proposed scenario (right)
Model-Based Diagnosis of Time Shift Failures in Discrete …
547
TEG. This production line corresponds to three pieces of equipment (namely Eq. 1, Eq. 2, Eq. 3). Equation 1 is modeled with a couple of places p3 , p4 : it is available (i.e., no current processing) if a token is in place p3 while it is processing its input if a token is in place p4 . The process of Eq. 1 is carried out in 3 h. Similarly Eq. 2 and Eq. 3 are respectively modeled by the couple of places p6 , p5 (processing time: 4 h) and p9 , p10 (processing time 4 h). Places p7 , p8 model wafer batch transportation between Eq. 1, Eq. 2 and Eq. 3. For i = {1, 2}, a trigger of an input transition ui represents the occurrence of an event from sensors on the production line that indicates the arrival of unprocessed wafer batches in front of Eqi. The output to the production line is a stream of fully processed wafer batches modeled by firing transition y1 . Outputs y2 and y3 provide observable information about the end of the process of Eq. 1 and Eq. 2. Suppose a scenario where a stream of 7 wafer batches arrive at Eq. 1 (input u1 ) respectively at time t ∈ {1, 2, 3, 4, 5, 6, 7} and suppose it is similar for u2 . As detailed later, this sequence of events can graphically be represented as a set of points (event of index γ = 0 arrives at time δ = 1...), see Fig. 1. Then, suppose that processed wafer batches are successively available at time {12, 17, 22, 27, 32, 37, 42} (output y1 ), and process information is available at time {5, 8, 11, 14, 17, 20, 23} for output y2 and at time {7, 12, 17, 22, 27, 32, 37} for output y3 , then the question is: based on the fault-free model of Fig. 1, can we detect and localize a time shift failure in the underlying production line? This paper aims at designing a model-based (max,+)-algebraic decision method for the detection of such time shifts relying on the use of the dioid Max in [[γ , δ]].
3 Mathematical Background About dioid theory The dioid theory is used to describe the inputs and the behavior of a system. Definition 1 A dioid D is a set composed of two internal operations ⊕ and ⊗.1 The addition ⊕ is associative, commutative, idempotent (i.e., ∀a ∈ D, a ⊕ a = a) and has a neutral element ε. The multiplication ⊗ is associative, distributive on the right and the left over the addition ⊕ and has a neutral element e. Element ε is absorbing by ⊗ (i.e., ∀a ∈ D, a ⊗ ε = ε = ε ⊗ a). For instance, let Zmax be the set (Z ∪ {−∞}) associated with the max operation denoted ⊕ (2 ⊕ 3 = 3...) with neutral element ε = −∞ and the integer addition denoted ⊗ (2 ⊗ 3 = 5...) with neutral element e = 0, then Zmax is a dioid. Definition 2 A dioid is complete if it is closed for infinite sums and if ⊗ is distributive over infinite sums. Zmax is not complete as +∞ does not belong to Zmax . Let Zmax = Zmax ∪ {+∞}, Zmax defines a complete dioid where (−∞) ⊗ (+∞) = (−∞). By construction, a dioid D is partially ordered with respect to with: ∀a, b ∈ D, a b ⇔ 1 When
there is no ambiguity, the symbol ⊗ is omitted.
548
C. Paya et al.
a ⊕ b = b. The exponential term ai , a ∈ D, i ∈ N is defined as follows: a0 = e i+1 i and ∀i > 0, a i = a ⊗ a . Finally, in a complete dioid, the Kleene star operator ∗ is a = i≥0 a . From this, it follows a fundamental result (Theorem 1): Theorem 1 Let D be a complete dioid, x = a∗ b is the solution of x = ax ⊕ b. AboutDioid Max in [[γ , δ]] The complete dioid B[[γ , δ]] is the set of formal series sJ = (n,t)∈J γ n δ t with J ⊆ Z2 where γ n δ t is a monomial composed of two com mutative variables γ and δ. Neutral elements are ε = (n,t)∈∅ γ n δ t and e = γ 0 δ 0 . Graphically, the series sJ of B[[γ , δ]] represents any collection J of point of coordinates (n, t) in Z2 with γ as horizontal axis and δ as vertical axis (see Fig. 1). The dioid Max in [[γ , δ]] is defined as the quotient of the dioid B[[γ , δ]] by the modulo γ ∗ (δ −1 )∗ . The internal operations are the same as in B[[γ , δ]] and neutral elements ε and e are identical to those of B[[γ , δ]] and Max in [[γ , δ]] is also complete. By [[γ , δ]] represents a non-decreasing function over γ construction, any series of Max in K nk tk and its canonical form is s = k=0 γ δ with K ∈ N ∪ {+∞} with n0 < n1 < . . . , t0 < t1 < . . . . Throughout this paper, we will use series of Max in [[γ , δ]] to represent the occurrence of an event type over time. For instance, Fig. 1 shows such a series u1 = u2 = γ 0 δ 1 ⊕ γ 1 δ 2 ⊕ γ 2 δ 3 ⊕ γ 3 δ 4 ⊕ γ 4 δ 5 ⊕ γ 5 δ 6 ⊕ γ 6 δ 7 ⊕ γ 7 δ +∞ as described in the scenario of Sect. 2. Definition 3 Let s ∈ Max in [[γ , δ]] be a series, the dater function of s is the nondecreasing function Ds (n) from Z → Z ∪ {+∞} such that s = n∈Z γ n δ Ds (n) . Series u1 (similarly for u2 ) has for dater function Du1 (0) = 1, Du1 (1) = 2, Du1 (2) = 3, Du1 (3) = 4, Du1 (4) = 5, Du1 (5) = 6, Du1 (6) = 7 and Du1 (7) = +∞. This dater function lists all the dates of the event occurrences. About time comparison in series: residuation theory Let D and C denote two complete dioids. Id C and Id D are the identity mappings on C and D. Definition 4 Let : D → C be an isotone mapping,2 is residuated if for all c ∈ C there exists a greatest solution of (x) = c. Moreover this solution is (c) where : C → D is the unique isotone mapping such that ◦ Id C and ◦ Id D . is called the residual of . Consider the isotone mapping Ly : D → D : x → y ⊗ x, it is residuated and its residual is Ly (z) also denoted yo\z. That is \yz is the greatest solution on x of y ⊗ x = z, ∀z ∈ D. Similarly, for the residuated isotone mapping Ry : D → D : x → x ⊗ y, ◦ Intuitively speaking, \ ◦ provide a way to formally compare z Ry (z) = z/y. yz and z/y and y in a complete dioid. \A = (Ao \A)∗ . Theorem 2 Let A ∈ Dn×m be a matrix of series. Then, Ao Time comparison between series of Max in [[γ , δ]] can be defined with residuals. that dioids are ordered sets. Let : S → S be an application defined on ordered sets, is isotone if ∀x, x ∈ S , x x ⇒ (x) (x ).
2 Recall
Model-Based Diagnosis of Time Shift Failures in Discrete …
549
Definition 5 Let a, b ∈ Max in [[γ , δ]] and their respective dater functions Da and Db . The time shift function representing the time shift between a and b for each n ∈ Z is defined by Ta,b (n) = Da − Db . Theorem 3 Let a, b ∈ Max in [[γ , δ]], the([6]) time shift function Ta,b (n) can be bounded by: ∀n ∈ Z, Db/a ◦ (0) ≤ Ta,b (n) ≤ −Da/b ◦ (0), 0 Db/◦a (0) ◦ and Da/b of series b/a where Db/a ◦ (0) is obtained from monomial γ δ ◦ (0) is 0 Da/◦b (0) ◦ of series a/b. obtained from γ δ
Definition 6 Let a, b ∈ Max in [[γ , δ]], the time shift between series a and b is (a, b) = [Db/a ◦ (0); −Da/b ◦ (0)],
(1)
D (0) ◦ and γ 0 δ a/◦b ◦ ∈ a/b. In this interval, the series from which where γ 0 δ Db/◦a (0) ∈ b/a the time offset is measured is the series a. It is called the reference series of the interval.
From this definition, if the time shift interval needs to be defined with series b as the reference series, the interval will be (b, a) = [Da/b ◦ (0); −Db/a ◦ (0)]. Example 1 Let us consider series a = γ 0 δ 12 ⊕ γ 1 δ 15 ⊕ γ 2 δ 18 ⊕ γ 3 δ 21 ⊕ γ 4 δ +∞ and b = γ 0 δ 12 ⊕ γ 1 δ 15 ⊕ γ 2 δ 19 ⊕ γ 3 δ 23 ⊕ γ 4 δ +∞ . The minimal time shift between 0 0 ◦ = γ 0 δ 0 ⊕ γ 1 δ 3 ⊕ γ 2 δ 7 ⊕ γ 3 δ 11 ⊕ in b/a a and b is Db/a ◦ (0) = 0 (found in γ δ 4 +∞ 0 −2 ◦ = from a/b γ δ ). The maximal time shift is −Da/b ◦ (0) = 2 (found in γ δ 0 −2 1 2 2 6 3 9 4 +∞ γ δ ⊕ γ δ ⊕ γ δ ⊕ γ δ ⊕ γ δ ). Therefore (a, b) = [0, 2]: series b is later than a with a minimum of 0 and a maximum of 2 h. Models of (max,+)-linear systems The elements of the TEG are represented by equations in Max in [[γ , δ]]. The equations can be grouped into a set of matrices A, B and C that contain information about the structure of the TEG. The state representation defines relations between any set of input event flows u and the state x, and the p×1 relations between the state x and the output event flows y. Let u ∈ Max in [[γ , δ]] ax n×1 be the input vector of size p, x ∈ Min [[γ , δ]] be the state vector of size n and q×1 [[γ , δ]] be the output vector of size q. The state representation is: y ∈ Max in
x = Ax ⊕ Bu, y = Cx,
n×n n×p q×n , B ∈ Max and C ∈ Max . Equality x = where A ∈ Max in [[γ , δ]] in [[γ , δ]] in [[γ , δ]] ∗ Ax ⊕ Bu can be transformed to x = A Bu thanks to Theorem 1 so we have
y = CA∗ Bu.
550
C. Paya et al.
Matrix H = CA∗ B represents the transfer function of the TEG, that is the dynamic of the system between the inputs and the outputs. For the system of Fig. 1 the matrices 6×6 6×2 3×6 , B ∈ Max and C ∈ Max of the state repreA ∈ Max in [[γ , δ]] in [[γ , δ]] in [[γ , δ]] sentation are: ⎛ A:
⎛
⎞
. γ 1 δ0 . . . . 0 δ3 γ . . . . . ⎟ ⎜ ⎜ . . . γ 1 δ0 . . ⎟ ⎜ . ⎟B 0 δ4 . γ . . . ⎝ ⎠ . γ 0 δ2 . γ 0 δ1 . γ 1 δ0 . . . . γ 0 δ4 .
:
⎞
γ 0 δ1 . ⎜ . 0. 1 ⎟ ⎜ . γ δ⎟ ⎜ . ⎟C . ⎠ ⎝ . . . .
:
. . . . . γ 0 δ0 . γ 0 δ0 . . . . . . . γ 0 δ0 . .
.
The exponent n of γ represents the backward event shift between transitions (the n + 1th firing of x1 depends on the nth firing of x2 ) and the exponent of δ represents the backward time shift between transition (the firing date of x2 depends on the firing date of x1 and time between 2 and 5).
4 How Can a (Max,+) Observer Be Sensitive to Time Shift Failures? The objective of the paper is to propose a method that detects time shift failures as proposed in Sect. 2 and that uses an observer as introduced in [2] and [3]. As later detailed in Sect. 4.2, this observer aims at computing a reconstructed state from the observation of the inputs and outputs of the system that is sensitive to a specific type of disturbance. These disturbances are characterized as new inputs w that slow down the system. The system will then be assumed to behave with respect to the following state representation: x = Ax ⊕ Bu ⊕ Rw, y = Cx. (2)
4.1 Time Shift Failures as Input Disturbances Throughout this paper, we consider that time shift failures are permanent phenomena that can occur at any step of the production. Formally speaking, a time shift failure is characterized by an unexpected and unknown delay d > 0 that is added to the normal duration time t of a place p. As shown in Fig. 2, this place is characterized by a transition upstream xi−1 , a duration t, a number of tokens o and a transition downstream xi . Let xi−1 = Kn=0 γ sn δ hn , where sn is the transition firing number, hn is the firing date and K the number of firing events. The normal downstream transition is xi = Kn=0 γ sn +o δ hn +t . When a time shift failure d > 0 holds in a place, the downstream transition then becomes: xi = Kn=0 γ sn +o δ hn +t+d . To characterize the same time shift failure over the place p by a disturbance, we will first modify the
Model-Based Diagnosis of Time Shift Failures in Discrete …
551
p
Fig. 2 Representation of a place
o xi−1
t
xi
Fig. 3 Representation of a place with disturbance
wi p o xi−1
t
xi
TEG. We add to the downstream transition xi an input wi , as shown in Fig. 3, which slows down this transition. This new input wi is not observed because it is related to a failure in an equipment. To get the same effect of an offset d > 0 in the downstream transition, input wi has to be defined as wi =
k γ sn +o δ hn +t+d .
(3)
n=0
Back to Fig. 1, to characterize a time shift failure of d = 1 on place p5 , a disturbance w4 is added to transition x4 as in Fig. 3. Suppose that x3 = γ 0 δ 2 ⊕ γ 1 δ 6 ⊕ γ 2 δ 10 ⊕ γ 3 δ 14 ⊕ γ 4 δ 18 ⊕ γ 5 δ 22 ⊕ γ 6 δ 26 ⊕ γ 7 δ +∞ . Since an offset of 1 time unit is present on p5 , x4 = γ 0 δ 2+4+1 ⊕ γ 1 δ 6+4+1 ⊕ γ 2 δ 10+4+1 ⊕ γ 3 δ 14+4+1 ⊕ γ 4 δ 18+4+1 ⊕ γ 5 δ 22+4+1 ⊕ γ 6 δ 26+4+1 ⊕ γ 7 δ +∞ . By setting the disturbance w4 = x4 = γ 0 δ 7 ⊕ γ 1 δ 12 ⊕ γ 2 δ 17 ⊕ γ 3 δ 22 ⊕ γ 4 δ 27 ⊕ γ 5 δ 32 ⊕ γ 6 δ 37 ⊕ γ 7 δ +∞ , the firing of transition x4 is slowed down. Based on this characterization, the faulty system that we consider will behave l×1 based on Eq. (2) and input disturbances as defined by Eq. (3). Let w ∈ Max in [[γ , δ]] be the input vector of disturbances of size l. The input w corresponds to the transition n×l is filled with γ 0 δ 0 monomials that that will be disturbed. Matrix R ∈ Max in [[γ , δ]] represent the connections between disturbances and internal disturbed transitions. All the other entries are set to ε. Equality x = Ax ⊕ Bu ⊕ Rw can be transformed to x = A∗ Bu ⊕ A∗ Rw thanks to Theorem 1 so we have y = CA∗ Bu ⊕ CA∗ Rw. In the example of Sect. 2, all the internal transitions in Fig. 1 will be disturbed 6×6 with ∀i, j{1, . . . , 6}i = j, R(i, j) = ε, R(i, i) = so R is the matrix R ∈ Max in [[γ , δ]] 0 0 e=γ δ .
552
C. Paya et al.
4.2 Observer Synthesis In this paper, we use the definition of an observer from the articles [2, 3]. Figure 4 shows the system with disturbances w and from which we can observe the outputs yo . The observer is a new model obtained from the fault-free model and that will estimate the states of the system xr in the presence of such disturbances. From articles [2, 3], we get the following observer’s equations: xr = Axr ⊕ Bu ⊕ L(yr ⊕ yo ) = (A ⊕ LC)∗ Bu ⊕ (A ⊕ LC)∗ LCA∗ Rw, yr = Cxr .
(4)
To obtain the estimated vector xr as close as possible to real state x, the observer n×q such that: relies on the largest matrix L ∈ Max in [[γ , δ]] xr xo (A ⊕ LC)∗ Bu ⊕ (A ⊕ LC)∗ LCA∗ Rw A∗ Bu ⊕ A∗ Rw ∗ ∗ ◦ ◦ B) ∧ (A∗ R/CA R). where L = (A∗ B/CA The observer matrix L of the TEG of Fig. 1 is
⎛
. ⎜ . ⎜ ⎜ . L=⎜ ⎜ ⎜ 1 0 .1 4 ∗ ⎝γ δ (γ δ ) γ 0 δ 0 (γ 1 δ 4 )∗
γ 1 δ 0 (γ 1 δ 3 )∗ γ 0 δ 0 (γ 1 δ 3 )∗ . . γ 0 δ 2 (γ 1 δ 4 )∗ γ 0 δ 6 (γ 1 δ 4 )∗
⎞ . ⎟ . ⎟ 1 0 1 4 ∗⎟ γ δ (γ δ ) ⎟ γ 0 δ 0 (γ 1 δ 4 )∗ ⎟ ⎟ γ 0 δ 1 (γ 1 δ 4 )∗ ⎠ γ 0 δ 5 (γ 1 δ 4 )∗
Fig. 4 Observer structure with disturbance (on the left) and the global architecture of the detection method (on the right)
Model-Based Diagnosis of Time Shift Failures in Discrete …
553
Suppose the system behaves with respect to the inputs u1 and u2 defined in Sect. 3 but transition x4 is disturbed with w4 = γ 0 δ 7 ⊕ γ 1 δ 12 ⊕ γ 2 δ 17 ⊕ γ 3 δ 22 ⊕ γ 4 δ 27 ⊕ γ 5 δ 32 ⊕ γ 6 δ 37 ⊕ γ 7 δ +∞ then the reconstructed state is xr = (A ⊕ LC)∗ Bu ⊕ (A ⊕ LC)∗ LCA∗ Rw which is the vector xr = [xr1 , . . . , xr6 ]T = ⎤ γ 0 δ 2 ⊕ γ 1 δ 5 ⊕ γ 2 δ 8 ⊕ γ 3 δ 11 ⊕ γ 4 δ 14 ⊕ γ 5 δ 17 ⊕ γ 6 δ 20 ⊕ γ 7 δ +∞ ⎢ γ 0 δ 5 ⊕ γ 1 δ 8 ⊕ γ 2 δ 11 ⊕ γ 3 δ 14 ⊕ γ 4 δ 17 ⊕ γ 5 δ 20 ⊕ γ 6 δ 23 ⊕ γ 7 δ +∞ ⎥ ⎥ ⎢ 0 2 ⎢ γ δ ⊕ γ 1 δ 7 ⊕ γ 2 δ 12 ⊕ γ 3 δ 17 ⊕ γ 4 δ 22 ⊕ γ 5 δ 27 ⊕ γ 6 δ 32 ⊕ γ 7 δ +∞ ⎥ ⎥ ⎢ 0 7 ⎢ γ δ ⊕ γ 1 δ 12 ⊕ γ 2 δ 17 ⊕ γ 3 δ 22 ⊕ γ 4 δ 27 ⊕ γ 5 δ 32 ⊕ γ 6 δ 37 ⊕ γ 7 δ +∞ ⎥ ⎥ ⎢ 0 8 ⎣ γ δ ⊕ γ 1 δ 13 ⊕ γ 2 δ 18 ⊕ γ 3 δ 23 ⊕ γ 4 δ 28 ⊕ γ 5 δ 33 ⊕ γ 6 δ 38 ⊕ γ 7 δ +∞ ⎦ γ 0 δ 12 ⊕ γ 1 δ 17 ⊕ γ 2 δ 22 ⊕ γ 3 δ 27 ⊕ γ 4 δ 32 ⊕ γ 5 δ 37 ⊕ γ 6 δ 42 ⊕ γ 7 δ +∞ ⎡
The reconstructed state xr takes into account the disturbance w4 . If w4 were not present, the monomial γ 0 δ 7 in xr4 would be γ 0 δ 6 (no time shift).
5 Time Shift Failure Detection in (Max,+)-Linear Systems Figure 4 on the right shows how the proposed set of indicators is designed: the system is ruled by the observable inputs u, the unobservable disturbances w and produces the observable outputs yo ; the observer estimates the states xr based on the observation of u and yo . States xs result from the simulation of the fault-free model (as in Fig. 1) based on u, the proposed indicator then relies on a series comparison denoted (xri , xsi ) (see Definition 6) for every transition xi . Definition 7 The indicator for state xi is Ixi (u, yo ) defined as the Boolean function that returns true iff (xri , xsi ) = [0, 0] with xs = [xs1 . . . xsn ]T = A∗ Bu, xr = [xr1 . . . xrn ]T = Axr ⊕ Bu ⊕ LCxr ⊕ Lyo , and (xri , xsi ) = [Dxri/x ◦ si (0), −Dxsi/x ◦ ri (0)]. Theorem 4 The indicator Ixi (u, yo ) returns true only if a time shift failure involving xi with (xri , xsi ) = [0, 0] has occurred in the system. A time shift failure involves a transition xi if the time shift failure occurs in a place of the TEG that is in the upstream3 of transition xi . Proof We show that, if the system has no failure in the upstream of xi , Ixi (u, yo ) necessarily returns false. Suppose the system does not have such a time shift failure, by definition of the observer, the reconstructed state xri is the same as the fault-free model state xsi as no place in the upstream of xi is disturbed. If xsi = xri , then we ◦ ri = xri/x ◦ si = xri/x ◦ ri but xri/x ◦ ri = (xri/x ◦ ri )∗ according to Theorem 2 and with have xsi/x ∗ ◦ ri ) = e ⊕ · · · = γ 0 δ 0 ⊕ . . . . So if xri = xsi , one Definition 1 of the Kleene star: (xri/x has Dxri/x ◦ si (0) = −Dxsi/x ◦ ri (0) = 0. In the example of Sect. 2, based on the previous observer, suppose that the system behaves with respect to the inputs u1 and u2 defined in Sect. 3. Suppose that in reality 3A
place p is in the upstream of a transition x in a TEG if there is a path of arcs from p to x.
554
C. Paya et al.
Δ(xr1 , xs1 ) = [Dx / , −Dx / (0)] = [0, 0], r1 ◦ xs1 (0) s1 ◦ xr1 , −D Δ(xr2 , xs2 ) = [Dx / ◦ x (0) ◦ x (0)] = [0, 0], x / r2
s2
s2
r2
r4
s4
s4
r4
r6
s6
s6
r6
Δ(xr3 , xs3 ) = [Dx / , −Dx / (0)] = [0, 6], r3 ◦ xs3 (0) s3 ◦ xr3 , −D Δ(xr4 , xs4 ) = [Dx / ◦ x (0) ◦ x (0)] = [1, 7], x / Δ(xr5 , xs5 ) = [Dx / , −Dx / (0)] = [1, 7], r5 ◦ xs5 (0) s5 ◦ xr5 Δ(xr6 , xs6 ) = [Dx / , −D ◦ x (0) ◦ x (0)] = [1, 7]. x /
32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2
xr3
xs3
γ 1
2
3
4
5
6
Fig. 5 Computed intervals for the scenario and representation of xr3 and xs3
there was an incident on Equipment 2: the operation lasts longer with a processing time of 5 h in p5 instead of 4 h (see Fig. 1). The real system is then characterized by Eq. (2) with the disturbance w4 that is defined in Sect. 4.1. The estimated state is the same as given at the end of Sect. 4.2. In particular, xr3 is represented with plain line in Fig. 5 (as well as series xs3 with dotted line). The expected state xs is the vector: ⎡ ⎤ ⎡ 0 2 ⎤ γ δ ⊕ γ 1 δ 5 ⊕ γ 2 δ 8 ⊕ γ 3 δ 11 ⊕ γ 4 δ 14 ⊕ γ 5 δ 17 ⊕ γ 6 δ 20 ⊕ γ 7 δ +∞ xs1 ⎢xs2 ⎥ ⎢ γ 0 δ 5 ⊕ γ 1 δ 8 ⊕ γ 2 δ 11 ⊕ γ 3 δ 14 ⊕ γ 4 δ 17 ⊕ γ 5 δ 20 ⊕ γ 6 δ 23 ⊕ γ 7 δ +∞ ⎥ ⎢ ⎥ ⎢ 0 2 ⎥ ⎢xs3 ⎥ ⎢ γ δ ⊕ γ 1 δ 6 ⊕ γ 2 δ 10 ⊕ γ 3 δ 14 ⊕ γ 4 δ 18 ⊕ γ 5 δ 22 ⊕ γ 6 δ 26 ⊕ γ 7 δ +∞ ⎥ ⎢ ⎥=⎢ 0 6 ⎥ ⎢xs4 ⎥ ⎢ γ δ ⊕ γ 1 δ 10 ⊕ γ 2 δ 14 ⊕ γ 3 δ 18 ⊕ γ 4 δ 22 ⊕ γ 5 δ 26 ⊕ γ 6 δ 30 ⊕ γ 7 δ +∞ ⎥ ⎢ ⎥ ⎢ 0 7 ⎥ ⎣xs5 ⎦ ⎣ γ δ ⊕ γ 1 δ 11 ⊕ γ 2 δ 15 ⊕ γ 3 δ 19 ⊕ γ 4 δ 23 ⊕ γ 5 δ 27 ⊕ γ 6 δ 31 ⊕ γ 7 δ +∞ ⎦ xs6 γ 0 δ 11 ⊕ γ 1 δ 15 ⊕ γ 2 δ 19 ⊕ γ 3 δ 23 ⊕ γ 4 δ 27 ⊕ γ 5 δ 31 ⊕ γ 6 δ 35 ⊕ γ 7 δ +∞ and the computed intervals are in Fig. 5. Indicators that return true are associated with transitions x3 , x4 , x5 , x6 . Indicators for transitions x1 , x2 return false. Now, if we assume that there is only one type of time shift failure in the system, Proposition 4 ensures that the time shift failure occurs in a place that is in the upstream of every transition x3 , x4 , x5 , x6 . The time shift failure occurs in Eq. 2, either in place p2 (transportation delay before the arrival in front of Eq. 2), or in place p6 (processing start of Eq. 2 is delayed), or in place p5 (process of Eq. 2 longer than expected).
6 Conclusion This paper defines a method for detecting time shift failures in systems modeled as timed event graphs using an observer that estimates the real states of the system. The method defines a formal (max,+) algebraic indicator based on the residuation theory. The proposed indicator is able to detect the presence of time shift failures as soon as it returns true and provides first localisation results. As a perspective, we aim at improving the accuracy of this indicator to better exploit the quantitative
Model-Based Diagnosis of Time Shift Failures in Discrete …
555
information contained in the interval (xri , xsi ). We expect that a further analysis about the bounds of the intervals may actually provide more information about failure localization and identification.
References 1. Baccelli, F., Cohen, G., Olsder, G.J., Quadrat, J.-P.: Synchronization and Linearity: An Algebra for Discrete Event Systems. Wiley, New York (1992) 2. Hardouin, L., Maia, C.A., Cottenceau, B., Lhommeau, M.: Observer design for (max,+) linear systems. IEEE Trans. Autom. Control 55(2), 538–543 (2010) 3. Hardouin, L., Maia, C.A., Cottenceau, B., Santos-Mendes, R.: Max-plus linear observer: application to manufacturing systems. In: 10th International Workshop on Discrete Event Systems, WODES’10, pp. 161–166 (2010) 4. Kim, C., Lee, T.E.: Feedback control of cluster tools for regulating wafer delays. IEEE Trans. Autom. Sci. Eng. 13(2), 1189–1199 (2015) 5. Komenda, J., Lahaye, S., Boimond, J.-L., van den Boom, T.: Max-plus algebra in the history of discrete event systems. Annu. Rev. Control. 45, 240–249 (2018) 6. MaxPlus.: Second order theory of min-linear systems and its application to discrete event systems. In: Proceedings of the 30th IEEE Conference on Decision and Control. CDC’91, pp. 1511–1516 (1991) 7. Sahuguède, A., Le Corronc, E., Pencolé, Y.: Design of indicators for the detection of time shift failures in (max, +)-linear systems. In: 20th World Congress of the International Federation of Automatic Control, pp. 6813–6818 (2017)
Innovative Technologies and Applications in Computer Intelligence
Adaptive Structural Deep Learning to Recognize Kinship Using Families in Wild Multimedia Takumi Ichimura and Shin Kamada
Abstract Deep learning has a hierarchical network architecture to represent the complicated feature of input patterns. We have developed the adaptive structure learning method of deep belief network (adaptive DBN) that can discover an optimal number of hidden neurons for given input data in a restricted Boltzmann machine (RBM) by neuron generation–annihilation algorithm and can obtain appropriate number of hidden layers in DBN. In this paper, our model is applied to Families in Wild Multimedia (FIW): A multi-modal database for recognizing kinship. The kinship verification is a problem whether two facial images have the blood relatives or not. In this paper, the two facial images are composed into one image to recognize kinship. The classification accuracy for the developed system became higher than the traditional method. Keywords Deep belief network · Restricted Boltzmann machine · Adaptive structure learning · Facial recognition · FIW
1 Introduction Recently, the research of deep learning has caused an unforeseen influence on not only theoretical research of artificial intelligence, but also practical application to realize innovations in our lives. The state-of-the-art convolutional neural network in deep learning models such as AlexNet [1], GoogLeNet [2], VGG [3], and ResNet [4] has produced greatly exceeds human recognition ability from the image recognition competition called ILSVRC. Such models are a type of convolutional neural networks (CNNs). In the models, several features including in the input data are trained at T. Ichimura (B) · S. Kamada Advanced Artificial Intelligence Project Research Center, Research Organization of Regional Oriented Studies, Prefectural University of Hiroshima, Hiroshima 734-8558, Japan e-mail: [email protected] S. Kamada e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_46
559
560
T. Ichimura and S. Kamada
multiple layers, and the features of the entire data can be represented with high accuracy by piling up many layers hierarchically. Another deep learning model is a deep belief network (DBN) [5] that has high classification performance by pre-training and piling up the restricted Boltzmann machine (RBM) [6], which is a generative stochastic neural network. Although DBN model itself does not reach the high classification performance of CNN models, the automatically determined method to the number of hidden neurons in RBM and hidden layers in DBN according to the input data space during learning has been developed [7]. The method is called the adaptive structural learning method of DBN (adaptive DBN). The neuron generation/annihilation and the layer generation algorithm work to find the optimal network structure [8]. The outstanding function can show higher classification accuracy for the image benchmark dataset [9] and the real-world problems than several existing CNN models [10]. In this paper, we applied the adaptive DBN to the database for kinship recognition, Family in the Wild (FIW) [11]. The hybrid model by Rachmadi et al. [12] was the fusion CNN configuration using the residual network. However, the model does not achieve better performance than the transition learning using ResNet [4]. Our model showed 95.95% classification accuracy of training data. The remainder of this paper is organized as follows. In Sect. 2, the adaptive structural learning of DBN is briefly explained. Section 3 explains the FIW database. In Sect. 4, the effectiveness of our proposed method is verified on FIW. In Sect. 5, we give some discussions to conclude this paper.
2 Adaptive Structural Learning Method of Deep Belief Network The basic idea of our proposed adaptive DBN is described in this section to understand the basic behavior of self-organized structure briefly.
2.1 RBM and DBN A RBM [6] is a stochastic unsupervised learning model. The network structure of RBM consists of two kinds of binary layer as shown in Fig. 1: a visible layer v ∈ {0, 1} I for input patterns and a hidden layer h ∈ {0, 1} J . The parameters b ∈ R I and c ∈ R J can work to make an adjustment of behaviors for a visible neuron and a hidden neuron, respectively. The weight Wi j is the connection between a visible neuron vi and a hidden neuron h j . A RBM tries to minimize the energy function E(v, h) in training as follows.
Adaptive Structural Deep Learning to Recognize … Fig. 1 Network structure of RBM
561
hidden neurons h0
...
h1
hJ
Wij
v0
...
v2
v1
vI
visible neurons
E(v, h) = −
bi vi −
i
cjh j −
j
i
vi Wi j h j ,
1 exp(−E(v, h)), Z Z= exp(−E(v, h)),
p(v, h) =
v
(1)
j
(2) (3)
h
Equation (2) shows the probability of exp(−E(v, h)). Z is calculated by summing energy for all possible pairs of visible and hidden vectors in Eq. (3). The parameters θ = {b, c, W } are optimized for given input data by partial derivative of p(v). A standard method of estimation for the parameters in a statistical model p(v) is maximum likelihood. Deep belief network (DBN) [5] is a stacking box for stochastic unsupervised learning by hierarchically building several pre-trained RBMs. The conditional probability of a hidden neuron j at the l-th RBM is defined by Eq. (4). p(h lj = 1|hl−1 ) = sigmoid(clj +
Wilj h l−1 i ),
(4)
i
where clj and Wilj are the parameters for a hidden neuron j and the weight at the l-th RBM, respectively. h0 = v means the given input data. When the DBN is a supervised learning for a classification task, the last layer is added to the final output layer to calculate the output probability yk for a category k by softmax as shown in Eq. (5). exp(z k ) , (5) yk = M j exp(z j )
562
T. Ichimura and S. Kamada
where z j is an output pattern of a hidden neuron j at the output layer. M is the number of output neurons. The difference between the output yk and the teacher signal is minimized.
2.2 Neuron Generation and Annihilation Algorithm of RBM Recently, deep learning models have higher classification capability for given large amount of data; however, the size of its network structure or the number of its parameters that a network designer determines must become larger. For the problem, we have developed the adaptive structural learning method in RBM model, called adaptive RBM [10]. RBM is an unsupervised graphical and energy based model on two kinds of layer: visible layer for input and hidden layer for feature vector, respectively. The neuron generation algorithm of adaptive RBM is able to generate an optimal number of hidden neurons for given input space during its training situation. Walking distance (WD) is defined as the difference between the past variance and the current variance for learning parameters [10]. If the network does not have enough neurons to classify them sufficiently, then the WD will tend to fluctuate large after the long training process. The situation shows that some hidden neurons may not represent an ambiguous pattern due to the lack of the number of hidden neurons. In order to represent ambiguous patterns into two neurons, a new neuron is inserted to inherit the attributes of the parent hidden neuron as shown in Fig. 2a. However, we may meet a situation that some unnecessary or redundant neurons were generated due to the neuron generation process. The neuron annihilation algorithm was applied to kill the corresponding neuron after neuron generation process. Figure 2b shows that the corresponding neuron is annihilated.
2.3 Layer Generation Algorithm of DBN DBN is a hierarchical model of stacking the several pre-trained RBMs. For building hierarchical process, output (activation of hidden neurons) of l-th RBM can be seen as the next input of l + 1-th RBM. Generally, DBN with many RBMs has higher power of data representation than one RBM. Such hierarchical model can represent the various features from an abstract concept to concrete representation at each layer in the direction of input layer to output layer. However, the optimal number of RBMs depends on the target data space. We developed adaptive DBN which can automatically adjust an optimal network structure to add the RBM one by one based on the idea of WD described in Sect. 2.2. When both WD and the energy function do not become small values, a new RBM will be generated to make the network structure suitable for the data set, since the RBM has lacked data representation capability to figure out an image of input patterns. Therefore, the condition for layer generation is defined by using the total WD and
Adaptive Structural Deep Learning to Recognize …
563 hidden neurons
hidden neurons h0
h0
h1
hnew
h1
generation
v0
v1
v3
v2
v0
v1
v3
v2
visible neurons
visible neurons
(a) Neuron generation hidden neurons h0
h1
hidden neurons h0
h2
h1
h2
annihilation
v0
v1
v0
v3
v2
visible neurons
v1
v2
v3
visible neurons
(b) Neuron annihilation Fig. 2 Adaptive RBM Suitable number of hidden neurons and layers is automatically generated. pre-training between 3rd and 4th layers pre-training between 2nd and 3rd layers pre-training between 1st and 2nd layers
Generation
Generation
Input
Fig. 3 An overview of adaptive DBN
Annihilation
pre-training between 4th and 5th layers, and fine-tuning for supervised learning
564
T. Ichimura and S. Kamada
the energy function in Eqs. (6)–(7). Figure 3 shows the overview of layer generation in adaptive DBN. k W Dl > θ L1 , (6) l=1 k
E l > θ L2
(7)
l=1
3 FIW (Family in the Wild) FIW [11] is the largest and most comprehensive available database for kinship recognition. The database has 656,954 image pairs split between the 11 relationship as shown in Fig. 4. The goal of kinship verification is to determine whether a pair of faces are blood relatives or not. The pair-wised data are categorized into 11 groups: father– daughter (FD), father–son (FS), mother–daughter (MD), mother–son (MS), siblings pairs (SIBS), brother–brother (BB), sister–sister (SS), grand father–grand daughter (GFGD), grand father–grand son (GFGS), grand mother–grand daughter (GMGD), and grand mother–grand son (GMGS). The size of each image is 224 × 224. FIW has two types of tasks, which are family classification and kinship verification. The family classification is a task to classify a given facial image into the 1000 families. In the kinship verification, given two images are verified so that whether they have 11 kinship relationships or not. In this paper, we used the kinship verification dataset as shown in Table 1 to develop our deep learning network by adaptive DBN.
4 Experimental Result The image composition of two facial images was trained by adaptive DBN. Table 2 shows the classification results by three types of deep learning models. The model in [12] is the hybrid model of EFNet Fig. 5a and LFNet Fig. 5a as shown in Fig. 5. The score was better performance among EFNet and LFNet. Our developed adaptive DBN showed highest classification accuracy for training data and test data among the other CNN models. Table 3 shows the classification accuracy of test cases for each category. The ‘-’ means no description in [12]. The trained DBN was automatically formed the network of 301,056, 713, 620, 447, 423, 325, and 123 neurons from inputto-output layer. The average rate for test cases remains 75.23%, because the images
Adaptive Structural Deep Learning to Recognize …
565
(a) FD
(b) FS
(c) MD
(d) MS
(e) SIBS
(f) BB
(g) SS
(h) GFGD
(i) GFGS
(j) GMGD
(k) GMGS Fig. 4 Sample facial images for Kinship in FIW
566 Table 1 FIW dataset Category None FD FS MD MS SIBS BB SS GFGD GFGS GMGD GMGS Total
T. Ichimura and S. Kamada
Training 34,898
Test 15,007
Total 49,905
18,514 20,131 17,354 17,844 15,210 10,651 8288 2468 3024 3274 3762 155,418
7843 8733 7378 7635 6559 4560 3533 1048 1300 1388 1624 66,608
26,357 28,864 24,732 25,479 21,769 15,211 11,821 3516 4324 4662 5386 222,026
Table 2 Classification accuracy Model Training data (%) CNN [12] VGGFace [13] ResNet [14] ResNet (Transfer leaning) Adaptive DBN
– – – 72.46 95.95
Test data (%) 64.22 68.80 74.90 62.47 75.23
includes ambiguous impression which human cannot be judged. We will develop a further improvement model to investigate the inner signal flow at each layer by using the fine-tuning method in [15, 16] and to make the ensemble learning by using the distillation learning model [17].
5 Conclusive Discussion We have been developed adaptive DBN, that can find the optimum structure by generating/annihilating neurons and layers during learning. In this paper, the proposed model was applied to FIW. Although the accuracy of the training cases was higher than the other models and the test cases were better than the conventional method, there were erroneous judgments. The further investigation will be required in future.
Adaptive Structural Deep Learning to Recognize …
567
Fig. 5 Hybrid model in [12]
res5_x res5_x
res5_x
res4_x
res4_x
res3_x
res3_x
res2_x
res2_x
conv1
conv1
res4_x
res3_x
res2_x
conv1
face1
face2
(a) EFNet Table 3 Classification accuracy of test cases by adaptive DBN Category Accuracy (%) None FD FS MD MS SIBS BB SS GFGD GFGS GMGD GMGS Average
76.68 74.50 77.10 72.89 73.80 77.13 78.07 71.70 71.37 76.92 67.58 72.29 75.23
face1
face2
(b) LFNet
568
T. Ichimura and S. Kamada
Acknowledgements This work was supported by JSPS KAKENHI Grant Number 19K12142, 19K24365 and obtained from the commissioned research by National Institute of Information and Communications Technology (NICT, 21405), Japan.
References 1. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems 25 (NIPS 2012) (2012) 2. Szegedy, C., Liu, W., et.al.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1–9 (2015) 3. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations (ICLR 2015) (2015) 4. He, K., Zhang, X., en, S.R., Sun, J.: Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770–778 (2016) 5. Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006) 6. Hinton, G.E.: A Practical Guide to Training Restricted Boltzmann Machines, Neural Networks, Tricks of the Trade, Lecture Notes in Computer Science (LNCS, vol. 7700), pp.599–619 (2012) 7. Kamada, S., Ichimura, T.: An adaptive learning method of restricted Boltzmann machine by neuron generation and annihilation algorithm. In: Proceedings of 2016 IEEE International Conference on Systems, Man, and Cybernetics (IEEE SMC 2016), pp. 1273–1278 (2016) 8. Kamada, S., Ichimura, T.: An adaptive learning method of deep belief network by layer generation algorithm. In: Proceedings of IEEE TENCON2016, pp. 2971–2974 (2016) 9. Krizhevsky, A.: Learning Multiple Layers of Features from Tiny Images, Master of thesis, University of Toronto (2009) 10. Kamada, S., Ichimura, T., Hara, A., Mackin, K.J.: Adaptive Structure Learning Method of Deep Belief Network using Neuron Generation-Annihilation and Layer Generation, Neural Computing and Applications, pp. 1–15 (2018) 11. Wang, S., Robinson, J.P., Fu, Y., Yun.: Kinship verification on families in the wild with marginalized denoising metric learning. In: 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pp. 216–221 (2017) 12. Rachmadi, R.F., Purnama, I.K.E., Nugroho, S.M.S., Suprapto, Y.K.: Image-based Kinship verification using fusion convolutional neural network. In: Proceedings of 2019 IEEE 11th International Workshop on Computational Intelligence and Applications (IWCIA), pp. 59–65 (2019) 13. Robinson, J.P., Shao, M., Wu, Y., Fu, Y.: Families in the wild (FIW): large-scale kinship image database and benchmarks. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 242–246 (2016) 14. Li, Y., Zeng, J., Zhang, J., Dai, A., Kan, M., Sham, S., Chem, X.: KinNet: fine-to-coarse deep metric learning for kinship verification. In: Proceedings of the 2017 Workshop on Recognizing Families in the Wild, pp. 13–20 (2017) 15. Kamada, S., Ichimura, T.: Fine tuning of adaptive learning of deep belief network for misclassification and its knowledge acquisition. Int. J. Comput. Intell. Stud. 6(4), 333–348 (2017) 16. Kamada, S., Ichimura, T., Harada, T.: Knowledge extraction of adaptive structural learning of deep belief network for medical examination data. Int. J. Semant. Comput. 13(1), 67–86 (2019) 17. Ichimura, T., Kamada, S.: A distillation learning model of adaptive structural deep belief network for affectnet: facial expression image database. In: Proceedings of the 9th International Congress on Advanced Applied Informatics (IIAI AAI 2020), pp. 454–459 (2020)
Detecting Adversarial Examples for Time Series Classification and Its Performance Evaluation Jun Teraoka and Keiichi Tamura
Abstract As deep learning techniques have become increasingly used in real-world applications, their vulnerabilities have received significant attention from deep learning researchers and practitioners. In particular, adversarial example on deep neural networks and protection methods against them have been well-studied in recent years because such networks have serious vulnerabilities that threaten safety in the real world. This paper proposes a detection method against adversarial examples for time series classification. Time series classification is a task that predicts the class label that an unlabeled time series belongs to. To protect time series classification from attacks using adversarial examples, we propose three types of methods detecting adversarial examples for time series classification: 2n-class-based (2NCB) detection, 2-class (2CB) detection, and feature vector-based (FVB) detection methods. Moreover, we propose the ensemble method, which detects adversarial examples by using majority vote of the three aforementioned methods. Experimental results show that the proposed methods are superior to the conventional method. Keywords Adversarial examples · Time series classification · Deep learning
1 Introduction Deep learning [1] has opened a new era of machine learning techniques due to its high performance for classification and prediction. With the increasing use of deep learning techniques in real-world applications, security issues related to their vulnerabilities have received significant attention from researchers and practitioners. The techniques of deceiving deep neural networks are clever and complex. In particuJ. Teraoka (B) Faculty of Information Sciences, Hiroshima City University, 3-4-1, Ozuka-Higashi, Asa-Minami-Ku, Hiroshima 731-3194, Japan K. Tamura Graduate School of Information Sciences, Hiroshima City University, 3-4-1, Ozuka-Higashi, Asa-Minami-Ku, Hiroshima 731-3194, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_47
569
570
J. Teraoka and K. Tamura
lar, adversarial examples targeting deep neural networks and the protection methods against them have been well-studied in recent years because they pose serious issues that threaten safety in the real-world [2–4]. Adversarial examples are input data that deceive deep neural networks and an instance with small, intentional feature perturbations that cause the deep neural networks to generate a false result. Thus, they are input data to deep neural networks that yield incorrect outputs, even if the networks work well with normal input data. For example, although an image of an adversarial example may seem to be the same image as the original image at a glance, the deep neural networks output a false output. Suppose that people can recognize an image as a stop sign for driving. An adversarial example of the image can deceive deep neural networks to output a malicious result such that it is interpreted as another sign. Adversarial examples are a different type of threat because, on the surface, the data is normal. They deceive deep learning models, which can potentially endanger lives. This paper proposes detection methods for adversarial examples for time series classification. Time series classification is a task that predicts the class label that the unlabeled time series belongs to. In this study, we only focus on uni-validated time series data and its classification models using deep neural networks. Adversarial examples for time series deceive deep neural networks to yield false output [5]. There are several approaches adding perturbations to create adversarial examples. The main target of this study is a fast gradient sign method (FGSM) [6]. Defense against adversarial examples for time series classification has different challenges from those faced with adversarial examples for image classification. Even though an image of a sign is detected as another sign, we can recognize that the image is an adversarial example when we see it. However, if a time series is classified into a class, we cannot recognize whether the time series is an adversarial example. To protect time series classification from attacks using adversarial examples, we propose three types of methods to detect adversarial examples for time series classification: 2n-class-based (2NCB) detection, 2-class (2CB) detection, and feature vector-based (FVB) detection methods. Moreover, we propose an ensemble method that detects adversarial examples by using a majority vote of the three aforementioned methods. In the 2NCB detection, a detection model is created by using N -class normal time series data and N -class adversarial examples created by the normal data, where the detecting model can classify 2N classes. In the 2CB detection, a detection model is created by using both normal time series and adversarial examples of them. The detection model classifies an input time series as normal or adversarial. In the FVB detection, we extract the last layer as feature vectors. To classify the extracted feature vectors, a support vector machine (SVM) is used. To evaluate the proposed method, we created adversarial examples using FGSM from actual time series data in the UCR Time Series Archive [7], which contains over 100 time series datasets. From this, we only use 85 older types of time series datasets. In the experiments, we evaluated the proposed method in terms of both the false negative rates and false positive rates. Experimental results show that the false negative rates of the proposed method are superior to those of the conventional method.
Detecting Adversarial Examples for Time Series Classification . . .
571
The remainder of the paper is organized as follows. In Sects. 2 and 3, related work and the definition of adversarial examples for time series classification are described, respectively. In Sect. 4, we propose new detection methods for protecting against adversarial examples. In Sect. 5, the proposed detection methods are evaluated using actual time series adversarial examples. In Sect. 6, we conclude the paper.
2 Related Work Security for machine learning has been attracting significant attention by many researchers and practitioners because real-world applications involving machine learning techniques are ubiquitous [8, 9]. Machine learning models are usually vulnerable to the adversarial manipulation of their inputs intended to cause incorrect classification [10]. Over the past few years, deep learning has given rise to a new era of machine learning techniques; therefore, adversarial examples have been attracting attention. The main topic of adversarial examples has been creating and detecting adversarial examples of images. Recently, adversarial examples of other types of data, such as audio and time series, were reported [11]. There are several types of defense methods, including removing perturbations, detecting adversarial examples, and creating models that have tolerance to adversarial examples. In this study, the proposed detection methods are based on detecting adversarial examples. In [12], a detection method was proposed based on the hypothesis where the outputs of adversarial examples are easy to change compared with normal data when the model is mutated. The detection method can detect adversarial examples for MNIST and CIFAR10. In [13], an image processing method that could reduce the effects of perturbations was proposed. Detection was performed by comparing the prediction results before and after processing. Tamura et al. [14] proposed a detecting method for audio adversarial examples. In [5], adversarial examples of time series were reported, and they pose a threat in time series classification. Abdu-Aguye et al. [15] proposed a detection method for adversarial examples of time series for time series classification. Preprocessing and feature extraction were performed on the input data based on sample entropy, which is an index for measuring signal complexity, and detrended fluctuation analysis (DFA), which is an index for measuring the statistical self-affinity of signals. Adversarial examples were detected using an outlier detector based on one-class SVM.
3 Adversarial Examples In this section, adversarial examples for time series classification and the main target of this study are explained.
572
J. Teraoka and K. Tamura
3.1 Definition of Adversarial Example Using Perturbation An adversarial example x˜ is an input data x with perturbations such that we cannot distinguish between the adversarial example and the input data. The adversarial examples are identical to the original input data at a glance. However, their outputs from deep neural networks are totally different. y = f (x); ˜y = f ( x˜ ); x˜ = x + p
(1)
where y is the output of x ˜y is the output of x˜ p is the perturbations and f is a deep neural network. Attackers attempt to add perturbations to x to create an adversarial example x˜ that minimizes the difference between x and x˜ , where f would output y. Even though both x and x˜ seem to be the same data, f outputs different results. arg min x˜ || x˜ − x||2 s.t. f (x) = y, f ( x˜ ) = ˜y, and y = ˜y
(2)
3.2 Target Model In this study, the target model we use isa fully convolutional network (FCN) model [16]. The FCN model is a typical deep learning-based time series data classification model. In the basic FCN model, there are three convolutional layers that can extract both local and global features of time series. Even though the target model is only the FCN model, adversarial examples have transferability [17]. Moreover, our proposed model is model-free. If we use any types of deep neural model as time series classification, we use our proposed method to detect adversarial examples.
3.3 Types of Attacks There are two types of adversarial examples depending on the environment, i.e., the white-box and black-box attacks [18]. In white-box attacks, attackers know the parameters, loss function, algorithm, and structure of the target deep neural network in advance. However, in black-box attacks, attackers do not possess this information. In the early stages of work on adversarial examples, perturbations can be created if all information on the target deep neural network is publicity available. Several recent works have mentioned that some deep neural networks can be simulated by their outputs, and the perturbations to create adversarial examples can be generated using these simulated deep neural networks [19]. There are two types of adversarial
Detecting Adversarial Examples for Time Series Classification . . .
573
examples depending on their purpose, i.e., targeted and non-targeted attacks. Targeted attacks misguide deep neural networks to output the desired results. Non-targeted attacks misguide deep neural networks to output random results. In this study, we propose detecting methods for white-box and non-targeted attacks.
3.4 Fast Gradient Sign Method (FGSM) Adversarial examples are typically generated by adding perturbations representing noise to the inputs that are correctly classified by a targeted deep neural network, and the added perturbations are imperceptible by humans. Recent works showed that a small specific pattern in a restricted region of the samples can also deceive the deep learning models. The FGSM was proposed by Goodfellow et al.; it is a well-known method for creating adversarial examples. x˜ = x + sign(∇x J (θ , x, y))
(3)
where x is input data, is a parameter adjusting the magnitude of perturbations, θ denotes model parameters, y is label of x, and J () is the loss function. FGSM is a white-box and non-targeted attack and proposed for attacks on image classification models, but it can also be applied to the time series data classification models. For image data, perturbations are applied to pixels, but for time series data, perturbations are applied to series data.
4 Proposed Method In this section, we propose three detection methods for adversarial examples for the time series data classification.
4.1 2NCB Detection Method The 2NCB detection method utilizes a 2n-class classification model (Fig. 1a) trained on n-class normal samples and n-class adversarial examples. In the 2n-class detection model, if the classification result of the detection model is one of the adversarial classes, the input data is judged to be adversarial example. The detection model is a classification model to determine whether the input data is a adversarial example.
574
J. Teraoka and K. Tamura
(a)
(b)
Fig. 1 a 2NCB detecting method, b 2CB detecting method
4.2 2CB Detection Method The 2CB detection method utilizes a 2-class classification model (Fig. 1b) that learns only whether the sample is adversarial or not. In the 2-class detection model, if the classification result of the detection model is the adversarial class, it is judged to be an adversarial example.
4.3 FVB Detection Method The FVB detection method uses feature vectors extracted from the final layer of the prediction model. First, the input data is fed to the prediction model and the feature vectors are extracted from the final layer of the model. Then, to improve the detection accuracy, principal component analysis (PCA) is performed on the feature vectors. PCA is a multivariate analysis method that transforms the coordinate system by defining axes that maximize the variance of the data, and it is mainly used for dimensionality reduction. SVM is used to determine whether a sample is adversarial. SVM is a classification algorithm mainly used for 2-class classification problems. The decision boundary is set such that the distance (margin) between the boundary and the nearest data of each class (support vector) is maximized. SVM can be applied to nonlinearly separable data by mapping the data into a high-dimensional space using kernel method. In this study, we train the SVM for a two-class classification problem to determine whether an input is an adversarial example. An overview of the detection method using feature vectors is shown in Fig. 2.
4.4 Ensemble Method In the ensemble method, the 2NCB, 2CB, and FVB detection methods are used for detection. The ensemble method makes a judgment by using a majority vote of the three judgment results. This method requires a lot of computation time and memory for detection because it utilizes three detection methods.
Detecting Adversarial Examples for Time Series Classification . . .
575
Fig. 2 FVB detecting method
5 Experiments To evaluate the proposed methods, experiments were conducted using the UCR Time Series Archive. In this section, we report the results of the experiments.
5.1 Dataset In the experiments, we use the UCR Time Series Archive (2018), which is a benchmark dataset of time series data classification problems. The UCR Time Series Archive consists of 128 labeled time series datasets with different characteristics, and each dataset is divided into training data and test data. All the data are already z-normalized. The UCR Time Series Archive is divided into the old archive, which consists of 85 datasets, and the new archive, which consists of 43 datasets. The old archive consists of only single-channel fixed-length series data in each class, and the number of data in training and test data is set to be easy to handle. In our experiments, we use all 85 datasets of the old archive.
5.2 Generating Adversarial Examples In the experiments, we train the models using the adversarial examples generated from the train data in the UCR Time Series Archive and evaluate them using the adversarial examples generated from the test data. Additionally, the adversarial examples are generated by FGSM. However, when we generate the adversarial examples from the pre-defined training data, the number of them of SonyAIBORobotSurface2 dataset was not sufficient, so we changed the allocation of training and test data. The parameter that adjusts the size of the perturbation of the adversarial examples, , is a uniform random number such that 0 < ≤ 0.2, and a different value of is used for each sample. The reason for this is that 0.2 < is not applicable to the problem of this study, since the shapes of the samples change significantly from the original data and many samples can be easily distinguished by humans.
576
J. Teraoka and K. Tamura
5.3 Performance Evaluation of the Proposed Methods We evaluate the detection performances of the four proposed methods on the adversarial examples of each dataset in the UCR Time Series Archive and compare them with that of the conventional method in paper [15]. We use the false negative rate (FN) as an evaluation metric to measure the detection performance. The FN is the percentage of adversarial examples judged incorrectly as not adversarial examples. In paper [15], the detection rate is described as an experimental result, so we used them to calculate the FN of the conventional method. The detection rate is the percentage of correctly identified adversarial examples. In an actual system, it is important not only avoid missing adversarial examples, but also avoid misidentifying normal samples. Thus, in addition to the FN, we evaluate the false positive rate (FP). The FP is the percentage of normal samples that are correctly judged as normal. The FN and FP of the four proposed methods and the conventional method are shown in Tables 1 and 2, and the number of best detection and average rank based on the macro-average of FN and FP is shown in Table 3. Table 3 is the summary of results in Tables 1 and 2. We can see that 2CB detection method has the best results in terms of the average, the number of wins, and the average rank for all datasets. In an real system, it is important for at least one method to have high detection performance in each dataset, because selecting a detection method based on the target data would be sufficient. On the other hand, there are some datasets in which detection is difficult to detect for all methods, where the macro-average exceeds 20%.
5.4 Comparison with Conventional Method The results of the comparison of the FNs of the four proposed methods and that of the conventional method are shown in Fig. 3a–d. Figure 3a–d is the results of plotting the FN of the conventional method on the horizontal axis and that of the proposed method on the vertical axis for each dataset. The straight lines in the graphs are the trajectories of the points where the FNs of the proposed methods and that of the conventional method are equal. Since most of the points in all four graphs are below the line, we can see that the FN of the proposed method is lower than that of the conventional method in many datasets. Specifically, out of 73 datasets, 49 datasets for the 2NCB detection method, 60 datasets for the 2CB detection method, and 38 datasets for the detection by FVB detection method have lower FNs than the conventional method. In all the graphs, the samples below the lines are often far from them. On the other hand, the samples above the line are relatively close to the line in many cases. From these results, it can be said that the proposed methods are superior to the conventional method with respect to the FN. On the other hand, as the FPs are not evaluated in [15], we cannot compare them. We should select the appropriate method considering whether we emphasis on the FN or FP.
Detecting Adversarial Examples for Time Series Classification . . .
577
Table 1 Performance of the proposed methods and conventional method1 Name FP
2NCB FN
FP
2CB FN
FN
FVB FP
FN
Ensemble FP
Conventional FN
Adiac
2.05
4.16
2.30
4.43
2.05
5.77
1.02
4.03
ArrowHead
4.00
15.06
0.00
22.78
8.00
26.25
1.14
20.46
8.31 14.29
Beef
0.00
7.55
30.00
1.89
3.33
16.98
3.33
3.77
10.00
BeetleFly
0.00
5.71
0.00
20.00
35.00
11.43
0.00
14.29
5.00
BirdChicken
20.00
6.90
10.00
6.90
35.00
10.34
20.00
3.45
25.00
CBF
12.78
46.76
1.56
87.45
27.89
6.17
6.22
47.77
50.00
Car
0.00
12.38
3.33
3.81
0.00
20.00
0.00
12.38
3.33
ChlorineConcentration
27.32
7.68
0.18
8.10
0.36
14.20
0.47
9.25
38.53
CinCECGTorso
8.33
12.86
6.09
11.11
16.52
19.27
6.74
12.20
15.91
Coffee
0.00
0.00
0.00
0.00
35.71
21.43
0.00
0.00
8.93
Computers
1.20
18.79
2.00
16.31
37.20
23.76
2.00
17.38
55.80
CricketX
4.87
23.52
7.69
19.94
12.31
25.55
5.64
20.87
32.95
CricketY
3.85
24.05
4.87
21.03
9.23
20.27
4.36
21.48
30.90
CricketZ
8.97
19.27
7.95
14.68
16.67
20.95
7.95
17.58
34.49
DiatomSizeReduction
9.15
13.58
2.94
16.98
11.44
23.58
9.15
15.47
9.48
DistalPhalanx OutlineAgeGroup
0.72
12.56
2.88
10.31
4.32
11.66
0.72
10.31
6.50
DistalPhalanx OutlineCorrect
3.26
13.93
2.54
13.93
5.43
16.10
1.09
13.62
–
DistalPhalanxTW
4.32
13.33
1.44
11.56
5.76
11.11
1.44
10.67
9.12
ECG200
6.00
21.32
6.00
27.94
34.00
25.00
6.00
24.26
49.00
ECG5000
1.11
14.81
1.02
12.84
4.80
13.50
1.04
12.53
45.30
ECGFiveDays
0.00
11.23
0.12
13.98
34.61
25.81
0.12
10.89
7.14
Earthquakes
0.00
10.76
0.00
9.42
27.34
8.97
0.00
10.31
47.67
ElectricDevices
2.10
43.98
1.75
43.24
6.69
43.77
1.88
44.08
51.26
FaceAll
0.18
30.31
0.24
25.99
6.39
39.52
0.24
28.66
50.74
FaceFour
0.00
39.84
0.00
46.34
26.14
25.20
0.00
36.59
49.43
FacesUCR
0.29
40.06
0.20
34.65
20.20
14.86
0.34
33.28
49.63
FiftyWords
0.66
11.63
0.00
4.03
2.42
15.90
0.22
8.90
4.84
Fish
0.00
1.25
3.43
0.63
2.86
10.03
0.00
0.94
6.86
FordA
0.23
3.98
0.23
2.76
1.14
14.94
0.00
3.80
4.53
FordB
0.00
3.18
0.00
2.67
0.49
15.91
0.00
3.27
8.59
GunPoint
0.00
2.39
0.67
1.44
1.33
9.57
0.00
2.39
6.33
Ham
3.81
8.51
0.00
5.32
9.52
19.68
0.00
7.98
13.33
HandOutlines
0.00
18.80
100.00 0.00
0.00
20.00
0.00
14.53
8.35
Haptics
0.97
12.82
93.51
1.28
4.87
14.29
3.57
9.34
3.03
Herring
0.00
7.84
25.00
0.00
0.00
12.75
0.00
5.88
4.69
InlineSkate
0.73
10.03
0.18
8.67
1.27
11.68
0.00
9.64
12.18
InsectWingbeatSound
0.15
18.20
0.61
9.92
1.77
15.75
0.25
15.03
11.34
ItalyPowerDemand
0.87
33.42
3.11
36.08
14.09
14.52
1.94
31.53
–
LargeKitchen Appliances
3.47
15.90
1.87
12.29
7.20
26.51
1.33
16.14
49.73
Lightning2
1.64
41.94
16.39
13.98
44.26
25.81
11.48
21.51
52.46
Lightning7
0.00
52.29
1.37
26.61
21.92
49.54
1.37
38.53
47.26
Mallat
0.00
8.53
0.43
1.56
0.00
15.83
0.00
7.19
4.84
Meat
0.00
6.84
1.67
0.00
0.00
10.26
0.00
5.98
3.33
578
J. Teraoka and K. Tamura
Table 2 Performance of the proposed methods and conventional method2 Name FP
2NCB FN
FP
2CB FN
FN
FVB FP
FN
Ensemble FP
Conventional FN
MedicalImages
0.79
23.79
0.00
5.23
6.32
24.19
0.26
18.46
37.96
MiddlePhalanx OutlineAgeGroup
0.00
23.05
2.60
12.20
0.00
13.90
0.00
13.56
–
MiddlePhalanx OutlineCorrect
1.03
7.10
1.37
7.32
6.19
10.20
0.69
7.98
–
MiddlePhalanxTW
0.00
30.07
1.30
14.13
1.30
17.39
0.00
17.39
–
MoteStrain
5.43
41.55
3.19
64.68
25.56
21.42
5.27
44.91
46.77
NonInvasive FetalECGThorax1
0.15
6.66
0.15
0.91
1.12
8.59
0.10
5.57
5.73
NonInvasive FetalECGThorax2
0.10
3.55
0.05
0.62
1.07
5.37
0.00
2.96
5.29
OSULeaf
0.41
12.86
0.83
5.10
0.83
18.69
0.00
10.44
5.79
OliveOil
0.00
6.90
0.00
15.52
0.00
10.34
0.00
10.34
46.67
Phalanges OutlinesCorrect
0.47
14.61
3.15
10.04
2.56
13.65
0.47
11.81
Phoneme
10.50
48.55
8.54
37.85
16.67
47.13
9.60
43.82
49.08
Plane
0.00
6.82
0.00
1.52
1.90
0.00
0.00
1.52
10.48
ProximalPhalanx OutlineAgeGroup
0.00
5.45
1.46
3.64
0.00
8.79
0.00
4.85
–
ProximalPhalanx OutlineCorrect
0.00
6.34
2.06
2.48
0.34
7.99
0.00
5.79
–
ProximalPhalanxTW
0.98
9.64
0.49
7.23
0.98
10.84
0.00
8.73
–
Refrigeration Devices
1.87
29.38
4.00
21.08
7.73
31.38
1.07
25.85
51.33
ScreenType
1.87
27.46
1.07
13.02
38.40
25.87
2.13
21.59
53.20
ShapeletSim
0.00
80.73
0.00
60.36
88.33
48.36
0.00
71.27
55.56
ShapesAll
0.67
8.92
0.67
4.70
2.33
8.05
1.00
7.00
23.58
SmallKitchen Appliances
0.00
24.72
0.00
10.15
4.00
23.18
0.00
20.09
37.07
SonyAIBORobot Surface1
0.33
74.14
9.65
64.59
31.45
11.20
3.16
64.73
–
SonyAIBORobot Surface2
39.77
74.33
92.44
6.21
31.69
4.76
55.82
6.73
–
StarLightCurves
0.00
3.05
0.00
0.09
5.14
7.59
0.00
1.06
5.99
Strawberry
0.00
4.33
0.00
1.18
1.08
9.45
0.00
3.94
5.71
SwedishLeaf
0.00
16.32
0.96
8.16
3.52
9.25
0.80
10.23
34.48
Symbols
0.30
16.72
0.10
11.51
1.31
28.15
0.00
16.49
36.28
SyntheticControl
3.00
44.83
0.33
90.91
12.67
11.29
1.67
45.77
–
ToeSegmentation1
5.70
26.95
8.77
37.94
42.98
12.77
8.77
23.40
42.32
ToeSegmentation2
1.54
38.10
0.77
39.29
25.38
28.57
0.00
36.90
50.77
Trace
0.00
0.00
0.00
0.00
0.00
9.47
0.00
0.00
55.50
TwoLeadECG
18.61
11.25
0.35
29.44
2.28
45.36
0.35
16.80
43.37
TwoPatterns
0.00
7.49
0.00
4.92
1.70
12.43
0.00
5.94
23.12
UWaveGesture LibraryAll
0.08
15.77
0.03
2.90
2.54
11.83
0.08
9.32
4.73
UWaveGesture LibraryX
0.08
16.95
0.08
4.76
2.32
12.66
0.11
11.44
4.47
(continued)
Detecting Adversarial Examples for Time Series Classification . . .
579
Table 2 (continued) Name
2NCB
2CB
FVB
Ensemble
Conventional
UWaveGesture LibraryY
0.45
14.31
0.31
4.52
2.68
12.21
0.11
10.04
5.15
UWaveGesture LibraryZ
0.36
12.97
0.20
3.53
1.90
10.24
0.22
8.94
–
Wafer
0.00
0.66
0.00
0.81
7.56
2.56
0.00
0.75
4.81
Wine
1.85
2.78
0.00
0.93
0.00
9.26
0.00
3.70
12.96
WordSynonyms
0.16
18.97
0.00
7.46
2.51
19.64
0.00
15.58
2.98
Worms
1.30
33.09
6.49
12.23
7.79
16.55
1.30
18.71
38.12
WormsTwoClass
1.30
17.97
14.29
14.06
7.79
14.06
2.60
14.06
37.29
Yoga
1.30
6.54
10.07
1.67
4.27
10.15
1.30
5.14
5.02
Average
2.75
18.87
6.11
15.03
11.10
17.28
2.33
15.79
24.89
Table 3 Comparison of best detection and rank including ensemble Method Best Detection Rank 2NCB detection method 2CB detection method FVB detection method Ensemble
9 50 6 23
2.78 1.82 3.55 1.73
6 Conclusion This paper proposes a detection method for adversarial examples in time series classification. To protect time series classification from attacks using adversarial examples, we propose three types of methods: the 2NCB, 2CB, and FVB detection methods. Moreover, we propose the ensemble method that detects adversarial examples by using a majority vote of the three aforementioned methods. Experimental results show that the proposed methods are superior to the conventional method in terms of FN. In the future, we intend to achieve better performance of both FP and FN, and speed-up computation time of the detection using the ensemble method.
580
J. Teraoka and K. Tamura
(a)
(b)
(c)
(d)
Fig. 3 a Comparison of the FN of 2NCB detecting method and that of the conventional method. b Comparison of the FN of 2CB detecting method and that of the conventional method. c Comparison of the FN of FVB detecting method and that of the conventional method. d Comparison of the FN of ensemble method and that of the conventional method
Acknowledgements This work was supported by JSPS KAKENHI Grant Number JP18K11320.
References 1. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press (2016) 2. Kurakin, A., Goodfellow, I.J., Bengio, S.: Adversarial examples in the physical world. http:// arxiv.org/abs/1607.02533 (2016) 3. Chakraborty, A., Alam, M., Dey, V., Chattopadhyay, A., Mukhopadhyay, D.: Adversarial attacks and defences: A survey. http://arxiv.org/abs/1810.00069 (2018) 4. Sharif, M., Bhagavatula, S., Bauer, L., Reiter, M.K.: A general framework for adversarial examples with objectives. ACM Trans. Priv. Secur. 22(3), 16:1–16:30 (2019). http://doi.acm. org/10.1145/3317611
Detecting Adversarial Examples for Time Series Classification . . .
581
5. Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., Muller, P.: Adversarial attacks on deep neural networks for time series classification. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2019) 6. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples (2015) 7. Dau, H.A., Keogh, E., Kamgar, K., Yeh, C.C.M., Zhu, Y., Gharghabi, S., Ratanamahatana, C.A., Yanping, Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G., Hexagon, M.L.: The UCR time series classification archive. https://www.cs.ucr.edu/~eamonn/time_series_data_2018/ (2018) 8. Barreno, M., Nelson, B., Joseph, A.D., Tygar, J.D.: The security of machine learning. Mach. Learn. 81(2), 121–148 (2010). https://doi.org/10.1007/s10994-010-5188-5 9. Biggio, B., Fumera, G., Roli, F.: Security evaluation of pattern classifiers under attack. IEEE Trans. Knowl. Data Eng. 26, 984–996 (2014) 10. Dalvi, N., Domingos, P., Mausam, Sanghai, S., Verma, D.: Adversarial classification. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD f04, pp. 99–108. Association for Computing Machinery, New York, NY, USA (2004). https://doi.org/10.1145/1014052.1014066 11. Carlini, N., Wagner, D.: Audio adversarial examples: Targeted attacks on speech-to-text. In: 2018 IEEE Security and Privacy Workshops (SPW), pp. 1–7 (2018) 12. Wang, J., Dong, G., Sun, J., Wang, X., Zhang, P.: Adversarial sample detection for deep neural network through model mutation testing. In: Proceedings of the 41st International Conference on Software Engineering, ICSE’19, pp. 1245–1256. IEEE Press (2019). https://doi.org/10. 1109/ICSE.2019.00126 13. Liang, B., Li, H., Su, M., Li, X., Shi, W., Wang, X.: Detecting adversarial image examples in deep neural networks with adaptive noise reduction. IEEE Trans. Depend. Secure Comput. 1–1 (2018) 14. Tamura, K., Omagari, A., Hashida, S.: Novel defense method against audio adversarial example for speech-to-text transcription neural networks. In: 11th IEEE International Workshop on Computational Intelligence and Applications, IWCIA 2019, Hiroshima, Japan, 9–10 Nov 2019, pp. 115–120. IEEE (2019). https://doi.org/10.1109/IWCIA47330.2019.8955062 15. Abdu-Aguye, M.G., Gomaa, W., Makihara, Y., Yagi, Y.: Detecting adversarial attacks in timeseries data. In: ICASSP2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3092–3096 (2020) 16. Wang, Z., Yan, W., Oates, T.: Time series classification from scratch with deep neural networks: a strong baseline. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 1578–1585 (2017) 17. Karim, F., Majumdar, S., Darabi, H.: Adversarial attacks on time series. IEEE Trans. Pattern Anal. Mach. Intell. 01, 1–1 (5555) 18. Kwon, H., Kim, Y., Park, K., Yoon, H., Choi, D.: Multi-targeted adversarial example in evasion attack on deep neural network. IEEE Access 6, 46084–46096 (2018) 19. Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against machine learning. In: Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security, ASIA CCS f17, pp. 506–519. Association for Computing Machinery, New York, NY, USA (2017). https://doi.org/10.1145/3052973.3053009
Efficient Data Presentation Method for Building User Preference Model Using Interactive Evolutionary Computation Akira Hara, Jun-ichi Kushida, Ryohei Yasuda, and Tetsuyuki Takahama
Abstract In this paper, we propose data presentation methods for building user preference models efficiently in recommendation systems by interactive genetic algorithm. The user preference model of the recommender agent is represented by a three-layer neural network (NN). In order to generate training data for this NN, it is necessary to present some sample items to the user and obtain the user’s evaluation values. Based on the idea that the evaluation distribution of the presented data should not be biased, we propose two types of data presentation methods. One is the selection of three types of data such as like, dislike and neutral. The other is the Inverse Proportional Selection of sampling points based on the frequencies of already presented data. As the results of experiments using four types of pseudo-users, we found that the Inverse Proportional Selection was the most efficient method in learning the preferences of users. Keywords Interactive evolutionary computation · User preference model
A. Hara (B) · J. Kushida · T. Takahama Graduate School of Information Sciences, Hiroshima City University, 3-4-1, Ozuka-higashi, Asaminami-ku, Hiroshima 731-3194, Japan e-mail: [email protected] URL: http://www.ints.info.hiroshima-cu.ac.jp J. Kushida e-mail: [email protected] URL: http://www.ints.info.hiroshima-cu.ac.jp T. Takahama e-mail: [email protected] URL: http://www.ints.info.hiroshima-cu.ac.jp R. Yasuda Faculty of Information Sciences, Hiroshima City University,3-4-1, Ozuka-higashi, Asaminami-ku, Hiroshima 731-3194, Japan © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_48
583
584
A. Hara et al.
1 Introduction Interactive evolutionary computation (IEC) is an evolutionary optimization method based on the user’s subjective evaluations. In IEC, the evaluations reflect the user’s preference. Therefore, it can be used to solve problems related to human sensibility such as graphic designs and music generations. In ordinary IGA-based approaches, individuals presented to the user are solutions of the problem. Interactive genetic algorithms (IGA) are also used in the field of recommendation systems, which present suitable products and information to users without inputting keywords on their interests. In IGA-based modeling of the user preference for recommendation systems, the user evaluates not the user’s preferences model directly but the data selected from the database by the recommender agent. Based on the user’s evaluation of the data, the user preference model is optimized by evolution. The user preference model has many parameters, and by adjusting these parameters, it can reproduce users’ preferences. Note that we do not optimize a string of exact user’s evaluation values for all the items by GA. We optimize the user preference model based on evaluation values for a small number of sampled items. In IGA-based approaches for recommender agents [1–4], the data matching the user’s preferences (i.e., “Like” data that the recommender agent gives high ratings to) are presented to the user. In [1], random selection from the database is also adopted for expanding the search space. However, the efficient data selection methods for learning user preferences have not been sufficiently examined. Therefore, in this paper, we propose efficient data presentation methods for acquiring user preference models from a small amount of user’s evaluations. The proposed method is based on the idea that unbiased evaluation distribution of data will lead to efficient search of the user’s preference model. Such efficient search also contributes to the reduction of the fatigue of users. This paper is organized as follows. Section 2 describes the optimization of recommender agents using IEC. Section 3 explains the proposed data presentation methods. Section 4 shows the experimental results and discussions. Section 5 describes conclusions and future work.
2 Optimization of User Preference Models by IEC 2.1 User Preference Model The recommender agent has a user preference model, and the model is represented by a number of parameters. The model can be used to search the database for data that matches the user’s preferences. In previous research, polynomial models and neural network models have been used to design user preference models [3, 4]. A set of parameters of the user preference model are used as genes of individuals, and the values are optimized by IGA.
Efficient Data Presentation Method for Building User Preference Model . . .
585
The fitness of the user preference model is calculated by the sum of differences between the recommender agent’s evaluations and the user’s evaluations for already presented data.
2.2 Mutual Evaluation Model The recommender agent evolves based on the user’s evaluation of the presented data. If respective agents present data independently to the user and the data are exclusively used for calculating their fitness, the agents have to present a lot of data to the user for collecting training data. It is a heavy burden for the user. To solve the problem, a mutual evaluation model has been proposed. The mutual evaluation model is shown in Fig. 1. In the model, the recommender agent uses not only the evaluation information of the data presented by itself but also the evaluation information of the data presented by other recommender agents when calculating the fitness of its own user preference model. All the recommender agents also evaluate all the data presented to the user. The recommender agent whose evaluations are the closest to the user has the best user preference model.
Fig. 1 Mutual evaluation model and hidden recommender agents
586
A. Hara et al.
2.3 Hidden Recommender Agent Model To improve the performance of IGA, the number of individuals should be increased. However, as the number of individuals increases, the number of recommender agents that present data to the user also increases. This leads to the increase of the burden on the user’s evaluations. Therefore, a hidden recommender agent model has been proposed to increase the number of recommender agents without increasing the user’s evaluation burden. A hidden recommender agent is an agent that does not present data to the user. However, the agent performs mutual evaluation of the data presented by other recommender agents. Therefore, the fitness of the agent can be determined. If a hidden recommender agent becomes superior to a normal recommender agent, the hidden recommender agent takes place of the inferior agent and becomes to present data to the user.
2.4 Optimization Flow of Recommender Agents The basic optimization flow of recommender agents is shown in Fig. 2. The details of each process are as follows: 1. In “Creation of Initial Individuals,” user preference models for the specified number of individuals are randomly generated. 2. In “Data Presentation,” one or more elite recommender agents present the specified number of high rated data. At the first presentation step in a trial, the data is randomly selected from the database. 3. In “Data Evaluation by User,” the user gives an evaluation value to the presented data. 4. In “Data Evaluation by Recommender Agents,” all recommender agents evaluate the data presented to the user and the historical data. The historical data means the data which has already been presented in the past Data Presentation step. 5. In “Evaluation of Recommender Agents,” the fitness of the recommender agent is determined from the user’s and the agent’s evaluation values. 6. In “GA Operations,” genetic operations are applied to the individuals, each of which represents parameters of the user preference model of the recommender agent. 7. In “Max Generations?”, the judgment whether the number of generations of the GA reaches to the pre-determined number is performed. If the condition is satisfied, go to step 8. Otherwise, go to step 4. 8. In “Max number of Data Presentation?”, whether “Data Presentation” has already been performed the specified times is judged. If the condition is satisfied, output the best individual. Otherwise, go back to step 2.
Efficient Data Presentation Method for Building User Preference Model . . .
587
Fig. 2 Optimization flow of recommender agents
3 Proposed Method 3.1 Data Presentation Method In this paper, we propose data presentation methods for acquiring the useful information on user preferences from a small amount of user evaluations. We propose the following two data presentation methods. 1. Like/Neutral/Dislike Selection The user’s favorite data (the data with the highest rating), neutral data (the data with the rating closest to 0.5), and disliked data (the data with the lowest rating) are selected from the database. 2. Inverse Proportional Selection First, a histogram with five intervals is generated from the already evaluated data. Next, one interval is probabilistically selected from the five intervals. The probability of the selection is inverse proportional to frequencies of already presented data. Once the interval is determined, the data whose predicted evaluated value is the closest to the center of the interval is selected. Fig. 3 shows the image of the Inverse Proportional Selection. To examine the effectiveness of the proposed methods, we also prepared the following presentation methods for comparisons. 3. Random Selection Data is randomly selected from the database. 4. Like Selection The recommender agent selects the data with the highest rating from the database. This presentation method is often used in conventional methods[1–4].
588
A. Hara et al.
Fig. 3 Inverse proportional selection
5. Dislike selection The recommender agent selects the data with the lowest rating from the database. 6. Like + Random Selection In addition to the Like Selection, some data are randomly selected from the database. As well as [1], the ratio of Like data to randomly selected data is set to 2:3.
3.2 User Preference Model by Neural Network In this research, a three-layer neural network (NN) is used as the user preference model for the recommender agent. A sigmoid function with gain 5 is used as the activation function. The NN consists of 10 neurons in the input layer, 5 neurons in the middle layer, and 1 neuron in the output layer. Therefore, the number of parameters of the user preference model is 61, corresponding to the connection weights between neurons and the threshold values of neurons in the middle and output layers. The parameters are used as the genes of the individual, and they are optimized by IGA. Each parameter is represented by a real value and is initialized with a uniform random number in the range of [−1, 1]. We created artificial item data for recommendation. The dataset contains 10,000 data, each of which has a real-valued ten-dimensional feature vector. The feature value for each dimension is generated by a uniform random number in the range of [0, 1]. When the features of the data are input to the user preference model (NN), the output is the evaluation value of the data. The evaluation value takes a real number
Efficient Data Presentation Method for Building User Preference Model . . .
589
in the range [0, 1]. The recommender agent considers that the data with high rating is suitable for the user’s preference.
3.3 Fitness of User Preference Model The fitness of a user preference model is based on the difference of evaluations of data between the recommender agent and the actual user. As shown in Eq. (1), the fitness f i of i-th user preference model is the sum of absolute errors between the user’s evaluation values and the evaluation values by recommender agent i for current and previously presented data. fi =
Nd
|E uj − E ij |
(1)
j=1
where Nd is the sum of the number of presented data and the historical data, E uj is the user’s evaluation value for data j, and E ij is the evaluation value of user preference model i for data j. Human sensibility can change from time to time. However, in this paper, we suppose that the past evaluation values for already presented data do not change in the optimization process.
4 Experiments and Discussion 4.1 Pseudo-Users To verify the effectiveness of the proposed method, we perform simulations using pseudo-users. Respective pseudo-users have different preference models represented by the NN architecture as described in Sect. 3. Pseudo-users evaluate presented data based on their model. The pseudo-user is not affected by fatigue or presentation order and can always give consistent evaluation values to the presented data. In this study, we use four types of pseudo-users with different evaluation distributions for the dataset as follows. User A User A tends to rate items highly. User B The rate of Dislike is extremely high. User C The rates of Like or Dislike are extremely high. User D User D has no bias in the distribution of evaluation values The histograms of the rating distributions of the pseudo-users are shown in Fig. 4.
590
A. Hara et al.
Fig. 4 Rating distribution histograms for pseudo-users A, B, C, and D
These pseudo-users were generated as follows: By using the NN architecture described in Sect. 3, we generated a number of NNs whose connection weights and thresholds were determined by random numbers. The responses of these NNs to all the data were observed, and four NNs were selected such that the response types were different from one another as shown in Fig. 4. The goal of our experiments is to construct different user preference models (i.e., different NNs) for respective pseudo-users.
4.2 Experimental Settings Table 1 shows the parameter settings of IGA. The data presentation steps are allowed 10 times. For each data presentation step, the recommender agent presents 10 data selected from the database to the user. The data already selected in the past data presentation steps are not selected again. After each data presentation step, the user
Efficient Data Presentation Method for Building User Preference Model . . . Table 1 Settings of GA parameters Max iterations of data presentation step Number of presented data Per data presentation step Max generations per data presentation step Number of recommender agents Selection method Tournament size Crossover method Mutation method Mutation rate
591
10 10 100 1 elite agent and 99 hidden agents Tournament selection + Elitism 7 BLX-α, α = 0.5 Gaussian mutation, μ = 0, σ 2 = 0.01 5%
preference model evolves 100 generations by real-coded GA. Therefore, 100 data are presented to the user and the recommender agents evolve for 1000 generations throughout the entire optimization process. All the data presented to the user in the past data presentation steps are used for calculating the fitness of the recommender agent. 100 recommender agents are used in our experiments. Only the elite recommender agent can present data to the user. The number of hidden recommender agents is 99. As genetic operators in IGA, we use tournament selection and elitism. Since chromosomes of individuals are presented by real values, BLX-α is used for crossover. Gaussian mutation is also used. The performance of each presentation method is measured by the evaluation error between the elite recommender agent in the last generation and the pseudo-user for all data in the database. The performance averaged over 100 trials is used for comparison.
4.3 Results and Discussion The simulation results are shown in Table 2. Figure 5 shows error curves for all the data of several presentation methods for the pseudo-user B. For the comparison, we picked up Like Selection which is often used in the conventional studies, and Random Selection which shows relatively good performance. As shown in the table, the proposed Inverse Proportional Selection and Like/ Neutral/Dislike Selection showed better performance than Like Selection which is often used in the conventional studies. This result indicates that the user’s preference can be efficiently learned if the evaluation distribution of the presented data is not biased, rather than presenting only the data that matches the user’s preference. Especially, the Inverse Proportional Selection is superior to the Like/Neutral/Dislike Selection. For the pseudo-user C, similar error curves to Fig. 5 were acquired.
592
A. Hara et al.
Table 2 Evaluation errors and rankings for pseudo-users Presentation User A User B User C method Error (Rank) Error (Rank) Error (Rank) (a) Like/Neutral/ Dislike selection (b) Inverse Proportional Selection (c) Random selection (d) Like selection (e) Dislike selection (f) Like + Random selection
User D Error (Rank)
0.0823 (4)
0.0883 (4)
0.1238 (4)
0.1403 (4)
0.0692 (2)
0.0675 (1)
0.0804 (1)
0.1180 (2)
0.0655 (1)
0.0776 (2)
0.0897 (2)
0.1141 (1)
0.1556 (6) 0.1241 (5)
0.1117 (5) 0.1601(6)
0.2066 (6) 0.2065 (5)
0.1796 (5) 0.1854 (6)
0.0728 (3)
0.0847 (3)
0.1037 (3)
0.1275 (3)
In addition, for the pseudo-user B and C, the Inverse Proportional Selection shows better results than Random Selection. The evaluation distribution of the data selected by Random Selection follows the user’s preference distribution. Therefore, if the user’s preference distribution is biased heavily toward like or dislike, the presented data becomes also biased even if Random Selection is used. This is the reason why Random Selection is not suitable for pseudo-user B and C. For the pseudo-user A and D, the overall error curves showed the same decreasing tendency as shown in Fig. 5. However, Random Selection showed slightly better performance than Inverse Proportional Selection. Inverse Proportional Selection selects data closest to the center of each interval. So, the evaluation distribution of the presented data is biased toward the centers of the five intervals. The subtle bias in the Inverse Proportional Selection may lead to the deterioration of the performance. However, the difference in performance is slight. As a whole, the Inverse Proportional Selection is the best method among six methods.
5 Conclusion In this paper, we proposed data presentation methods for collecting user preference data to build the user preference model efficiently by IGA. Based on the idea that the rating distribution of the presented data should not be biased, we proposed two methods of selecting data, Like/Neutral/Dislike Selection and Inverse Proportional Selection. To examine the effectiveness of the proposed methods, we performed simulations using four types of pseudo-users. The simulation results showed that the Inverse Proportional Selection is the best among the proposed and conventional six methods.
Efficient Data Presentation Method for Building User Preference Model . . .
593
Fig. 5 Error curves for pseudo-user B
From now on, we have to verify the effectiveness of the proposed method with actual users and datasets. In addition, the frequent presentations of dislike samples may cause mental fatigue to users. We would like to also take the user’s stress in evaluating the presented data into consideration.
References 1. Hakamata, J., Tokumi, Y., Tokumaru, M.: A framework of recommender system using interactive evolutionary computation. J. Jpn. Soc. Kansei Eng. 11(2), 281–288 (2012) 2. Takenouchi, H., Tokumaru, M.: Membership function optimization of Kansei retrieval agents with fuzzy reasoning. In: 31st Fuzzy System Symposium, pp. 693–698 (2015) 3. Tokumaru, M., Muranaka, N.: Optimization of Kansei retrieval agents using evolutionary computation. J. Jpn. Soc. Kansei Eng. 8(3), 885–892 (2009) 4. Okunaka, D., Tokumaru, M.: Kansei retrieval model using a neural netork. J. Jpn. Soc. Kansei Eng. 11(2), 331–338 (2012)
Image-Based Early Detection of Alzheimer’s Disease by Using Adaptive Structural Deep Learning Shin Kamada, Takumi Ichimura, and Toshihide Harada
Abstract Deep learning has been a successful model which can effectively represent several features of input space and remarkably improve image recognition performance on the deep architectures. In our research, an adaptive structural learning method of restricted Boltzmann machine (adaptive RBM) and deep belief network (adaptive DBN) has been developed as a deep learning model. The models have a self-organize function which can discover an optimal number of hidden neurons for given input data in a RBM by neuron generation–annihilation algorithm and can obtain an appropriate number of RBMs as hidden layers. In this paper, the proposed model was applied to MRI and PET image datasets in ADNI digital archive for the early detection of mild cognitive impairment (MCI) and Alzheimer’s disease (AD). Two kinds of deep learning models were constructed to classify the MRI and PET images. For the training set, our model showed 99.6 and 99.4% classification accuracy for MRI and PET images. For the test set, the model showed 87.6 and 98.5% accuracy for them. Our model achieved the highest classification accuracy among the other CNN models. Keywords Deep learning · Deep belief network · Adaptive structural learning method · MRI/PET · Alzheimer’s disease
S. Kamada (B) · T. Ichimura Advanced Artificial Intelligence Project Research Center Research Organization of Regional Oriented Studies, Prefectural University of Hiroshima, Hiroshima 734-8558, Japan e-mail: [email protected] T. Ichimura e-mail: [email protected] T. Harada Faculty of Health and Welfare, Prefectural University of Hiroshima, Hiroshima 734-8558, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_49
595
596
S. Kamada et al.
1 Introduction Recently, artificial intelligence (AI) with sophisticated technologies has become an essential technique in our life [1]. Especially, the recent advances in deep learning methods enable higher performance for several big data compared to traditional methods [2]. Convolutional neural networks (CNNs) such as AlexNet [3], GoogLeNet [5], VGG16 [4], and ResNet [6], highly improved classification or detection accuracy in image recognition [7]. In our research, we have developed the adaptive structural learning method of deep belief network (adaptive DBN) which find a suitable size of network structure for given input space during its training [8]. The neuron generation and annihilation algorithms [9, 10] were implemented on restricted Boltzmann machine (RBM) [11], and layer generation algorithm [12] was implemented on Deep Belief Network (DBN) [13]. The proposed model showed the highest classification capability in the research field of image recognition by using some benchmark datasets such as MNIST [14], CIFAR-10, and CIFAR-100 [15], among the existing CNNs [8]. In this paper, the proposed model was applied to MRI and 18F-FDG-PET (Fluorodeoxyglucose-Positron Emission Tomography) image datasets in ADNI digital archive [16] for the early detection of mild cognitive impairment (MCI) and Alzheimer’s disease (AD). In Japan, it is considered that one in four Japanese over 65 years old is MCI or AD. The early detection or diagnosis of MCI/AD is important task for a patient and his/her family in recent aging society. The image-based diagnoses using MRI and PET are common tests to detect MCI/AD. The previous research [17, 18] developed a binary classification model using CNN which classifies a MRI/PET image to normal (CN) or abnormal (MCI or AD) state with ADNI data. The other research [19] developed a Unet-like CNN to extract several features from PET images. Compared with MRI and PET images, it is considered that the diagnosis of MCI and AD using MRI image is more difficult than that of using PET. In fact, the paper [17, 18] showed the classification accuracy of PET image (91.2%) was better than that of MCI image (84.9%). In this paper, we challenge to improve the classification accuracy on MRI and PET images by our proposed adaptive DBN. Therefore, two kinds of deep learning models were constructed to classify the MRI and PET image. For the training set, our model showed 99.6 and 99.4% classification accuracy for MRI and PET images; for the test set, the model showed 87.6 and 98.5% accuracy for them. Our model achieved the highest classification accuracy among the other CNN models. The remainder of this paper is organized as follows. In Sect. 2, the basic idea of the adaptive structural learning of DBN is briefly explained. Section 3 gives the description of ADNI dataset. In Sect. 4, the effectiveness of our proposed method is verified on MRI and PET images of ADNI. In Sect. 5, some discussions are given to conclude this paper.
Image-Based Early Detection of Alzheimer’s Disease . . .
597
2 Adaptive Learning Method of Deep Belief Network This section explains the traditional RBM [11] and DBN [13] to describe the basic behavior of our proposed adaptive learning method of DBN [8].
2.1 Restricted Boltzmann Machine and Deep Belief Network A RBM [11] is a kind of stochastic model for unsupervised learning. In the network structure of RBM, there are two kinds of binary layers as shown in Fig. 1: a visible layer v ∈ {0, 1} I for input patterns and a hidden layer h ∈ {0, 1} J . The parameters b ∈ R I and c ∈ R J can work to make an adjustment of behaviors for a visible neuron and a hidden neuron, respectively. The weight Wi j is the connection between a visible neuron vi and a hidden neuron h j . A RBM minimizes the energy function E(v, h) during training in the following equations. E(v, h) = −
bi vi −
i
cjh j −
j
i
vi Wi j h j ,
1 exp(−E(v, h)), Z Z= exp(−E(v, h)),
p(v, h) =
v
(1)
j
(2) (3)
h
Equation (2) is a probability of exp(−E(v, h)). Z is calculated by summing energy for all possible pairs of visible and hidden vectors in Eq. (3). The parameters θ = {b, c, W } are optimized by partial derivative of p(v) for given input data v. A maximum likelihood method can be used to estimate these parameters as a statistical model. Typically, contrastive divergence (CD) [20] is the most popular RBM learning method as a fast algorithm of Gibbs sampling method. DBN [13] is a stacking box for stochastic unsupervised learning by hierarchically organized several pre-trained RBMs. The conditional probability of a the jth hidden neuron at the lth RBM is defined by Eq. (4). p(h lj = 1|hl−1 ) = sigmoid clj +
Wilj h l−1 , i
(4)
i
where clj and Wilj indicate the parameters for the jth hidden neuron and the weight at the lth RBM, respectively. h0 = v means the given input data. In order to make the DBN a supervised learning for a classification task, the last layer appends to the final output layer to calculate the output probability yk for a category k. The calculation
598
S. Kamada et al.
Fig. 1 Network Structure of RBM
hidden neurons h0
...
h1
hJ
Wij
v0
v1
v2
...
vI
visible neurons
is implemented by Softmax in Eq. (5). exp(z k ) , yk = M j exp(z j )
(5)
where z j is an output pattern of the jth hidden neuron at the output layer. The number of output neurons is M. The difference between the output yk and the teacher signal for the category k is minimized.
2.2 Neuron Generation and Annihilation Algorithm of RBM While recent deep learning models have drastically improved classification capability for image recognition, the network architecture has been deeper and more complicated structure. Some problems related to the network structure or the number of some parameters still remain to become a difficult task as the AI research. In order to tackle with the problem, we have developed the adaptive structural learning method in RBM model (adaptive RBM) [8]. The neuron generation algorithm of the adaptive RBM can generate an optimal number of hidden neurons, and the trained RBM is suitable structure for given input space. The neuron generation is based on the idea of walking distance (WD) in the paper [21]. WD is calculated by the difference between the prior variance and the current one of learning parameters during training. The paper [21] described that if the network does not have enough neurons to classify input patterns sufficiently, then the WD will tend to fluctuate largely after the long training process. By using the idea, we developed the adaptive RBM in previous research. Although RBM has three kinds of parameters which are visible neurons, hidden neurons, and the weights among their connections, our adaptive RBM can monitor their parameters excluding the visible one (the paper [8] describes the reason of the disregard). The large value
Image-Based Early Detection of Alzheimer’s Disease . . .
599
hidden neurons h0
hidden neurons
h1
h0
hnew
h1
generation
v0
v1
v3
v2
v0
v1
visible neurons
v3
v2
visible neurons
(a) Neuron generation hidden neurons h0
h1
hidden neurons
h2
h0
h1
h2
annihilation
v0
v1
v2
v3
v0
visible neurons
v1
v2
v3
visible neurons
(b) Neuron annihilation Fig. 2 Adaptive RBM
of WD means that some existing hidden neurons cannot represent an ambiguous input pattern, since there is the lack of the number of hidden neurons. In order to express the ambiguous patterns, a new neuron is inserted to inherit the attributes of the parent hidden neuron as shown in Fig. 2a. After the neuron generation, the adaptive RBM can remove some redundant neurons by the neuron annihilation algorithm. This is, we may meet a situation that some unnecessary or redundant neurons were generated due to the neuron generation process. Therefore, such neurons that the output value of hidden neuron activation is less than pre-determined threshold will be removed as shown in Fig. 2b [10].
2.3 Layer Generation Algorithm of DBN A DBN is a hierarchical energy-based model, which is constructed by building the several pre-trained RBMs. For the building process, output patterns by activation of hidden neurons at l-th RBM can be seen as the next level of input at (l + 1)-th
600
S. Kamada et al.
Suitable number of hidden neurons and layers is automatically generated. pre-training between 3rd and 4th layers pre-training between 2nd and 3rd layers pre-training between 1st and 2nd layers
pre-training between 4th and 5th layers, and fine-tuning for supervised learning
Annihilation
Generation
Generation
Input
Fig. 3 Overview of adaptive DBN
RBM. Generally, DBN with multiple RBMs has higher data representation power than one RBM. Such hierarchical model can represent the specified features from an abstract concept to an concrete object in the direction from input layer to output layer. However, the optimal number of RBMs depends on the target data space. As same as the neuron generation algorithm, automatically layer generation method of DBN was developed, called adaptive DBN [12]. Since DBN is a hierarchical model, the total WD of each RBM was measured for layer generation algorithm. If the total WD was smaller value than pre-determined threshold, then a new RBM will be generated to keep the suitable network classification power for the dataset, since the RBM has lacked the power of data representation to draw an image of input patterns. Figure 3 shows the overview of layer generation in adaptive DBN. Adaptive DBN showed higher classification accuracy on several benchmark datasets such as CIFAR-10 compared with existing CNN methods [8].
3 ADNI Dataset Alzheimer’s disease neuroimaging initiative (ADNI) [16] is a global longitudinal study for the early detection of AD since 2004. Several kinds of clinical data such as MRI images, biochemical bio-markers, and so on are collected and provided as a digital archive for research purpose. ADNI mainly provides three types of image data which are MRI, PET, and fMRI. In this paper, two kinds of deep learning models were constructed using ADNI image data for the early detection of AD. The first model is to classify a MRI image to three categories, cognitive normal (CN), MCI, and AD, and the second model is to classify
Image-Based Early Detection of Alzheimer’s Disease . . .
601
a PET image to four categories, CN, early state of MCI (EMCI), late state of MCI (LMCI), and AD. These categories are annotated by ADNI’s researchers, and each data is provided as 3D data such as DICOM and NIfTi format. We used 360 and 105 patients’ data for MRI and PET models, respectively, and the sliced images were extracted from them in the direction of axial axis. Each image is grayscaled one of 256 × 256 and 160 × 160 pixels for MRI and PET. Figures 4 and 5 show some
(a) CN
(b) MCI
(c) AD
(b) EMCI
(c) LMCI
Fig. 4 Examples of MRI images
(a) CN
(d) AD Fig. 5 Examples of PET images
602
S. Kamada et al.
Table 1 Dataset for MRI images Category Train CN MCI AD Total
910 770 790 2470
Table 2 Dataset for PET images Category Train CN EMCI LMCI AD Total
253 264 198 187 902
Test
Total
260 430 440 1130
1170 1200 1230 3600
Test
Total
66 44 66 77 253
319 308 264 264 1155
image examples for MRI and PET. For evaluation, the all image data was splitted into training and test sets as shown in Tables 1 and 2.
4 Experiment Results In this section, the classification performance of MRI and PET images was reported. Our proposed adaptive DBN trained the training set as mentioned in Sect. 3, and then the classification accuracy was evaluated by test set. Tables 3 and 4 show the classification accuracy for MRI and PET images, respectively. Tables 5 and 6 show the confusion matrix for the test sets of them. The trained DBN for MRI was automatically formed with the network of 613, 519, 501, 398, 221, and 156 neurons from input to output layer. For the PET model, 522, 449, 321, 227, and 98 neurons were acquired. For the training set, our model showed 99.6 and 99.4% classification accuracy for MRI and PET images. For the test sets, the model showed 87.6 and 98.5% accuracy for them. The model was able to classify not only the normal state (CN) but also the abnormal states (MCI and AD) with high accuracy. The accuracy for MRI was worse than PET although the MRI model had bigger network size by neuron and layer generation than the PET model. We note the PET image is typically easier to distinguish CN, MCI, and AD than MRI because PET is a test which can visualize glucose metabolism on the brain. On the contrary, MRI image is used to detect hippocampal atrophy. However, the relation between progress of MCI/AD and hippocampal atrophy is not always occurred; this is, the other medical test in addition MRI image should be required for further improvement (Table 7).
Image-Based Early Detection of Alzheimer’s Disease . . .
603
Table 3 Classification accuracy of adaptive DBN for MRI images Category Train (%) Test (%) CN MCI AD Total
99.8 99.5 99.4 99.6
92.3 85.5 86.6 87.6
Table 4 Classification accuracy of adaptive DBN for PET images Category Train (%) Test (%) CN EMCI LMCI AD Total
99.2 99.2 99.5 99.3 99.3
98.5 97.7 98.5 98.7 94.5
Table 5 Confusion matrix of adaptive DBN for MRI images Predicted True
CN MCI AD
CN
MCI
AD
240 32 8
15 369 51
5 29 381
Table 6 Confusion matrix of adaptive DBN for PET images
True
CN EMCI LMCI AD
CN
EMCI
65 0 0 0
1 43 1 0
Table 7 Comparison with other model for MRI test images Model AD versus CN (%) CNN [18] Adaptive DBN
84.9 92.3
Predicted LMCI
AD
0 1 65 1
0 0 0 76
MCI versus CN (%) 77.8 85.5
604
S. Kamada et al.
Table 8 Comparison with other model for PET test images Model AD versus CN (%) CNN [17] CNN [19] Adaptive DBN
91.2 97.9 98.7
MCI versus CN (%) 78.9 96.7 98.5
To verify the effectiveness of our model, we investigated the performance of related works using deep learning. Table 8 shows the binary classification results of either AD or CN, and MCI or CN, reported in the other CNN models [17–19]. From these results, our model achieved the highest classification accuracy among these methods.
5 Conclusion Deep learning is widely used in various kinds of research fields, especially image recognition. In our research, the adaptive RBM and adaptive DBN were developed for finding an optimal network structure from given data. In this paper, the proposed model was applied to MRI and PET image datasets in ADNI digital archive for the early detection of MCI and AD. For the training set, our model showed 99.6% and 99.4% classification accuracy for MRI and PET images. For the test sets, the model showed 87.6% and 98.5% accuracy for them. Our model achieved the highest classification accuracy among the other CNN models. For further improvement of MRI image, the mechanism of multi-modal learning which includes the other medical test in addition to the MRI image should be required. In the future, we will evaluate our model with the other MRI images collected from Japanese clinics. Acknowledgements This work was supported by JSPS KAKENHI Grant Number 19K12142, 19K24365 and obtained from the commissioned research by National Institute of Information and Communications Technology (NICT, 21405), Japan.
References 1. Markets and Markets: http://www.marketsandmarkets.com/Market-Reports/deep-learningmarket-107369271.html. Accessed 28 Nov 2018 (2016) 2. Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. Arch. 2(1), 1–127 (2009) 3. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of Advances in Neural Information Processing Systems 25 (NIPS 2012) (2012)
Image-Based Early Detection of Alzheimer’s Disease . . .
605
4. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Proceedinga of International Conference on Learning Representations (ICLR 2015) (2015) 5. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of CVPR2015 (2015) 6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016) 7. Russakovsky, O., Deng, J., Su, H., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015) 8. Kamada, S., Ichimura, T., Hara, A., Mackin, K.J.: Adaptive structure learning method of deep belief network using neuron generation-annihilation and layer generation. Neural Comput. Appl. 1–15 (2018). https://doi.org/10.1007/s00521-018-3622-y 9. Kamada, S., Ichimura, T.: An adaptive learning method of restricted Boltzmann machine by neuron generation and annihilation algorithm. In: Proceedings of 2016 IEEE International Conference on Systems, Man, and Cybernetics (IEEE SMC 2016), pp. 1273–1278 (2016) 10. Kamada, S., Ichimura, T.: A structural learning method of restricted Boltzmann machine by neuron generation and annihilation algorithm. In: Neural Information Processing, Lecture Notes in Computer Science (LNCS), vol. 9950, pp. 372–380 (2016) 11. Hinton, G.E.: A practical guide to training restricted Boltzmann machines. In: Neural Networks, Tricks of the Trade, Lecture Notes in Computer Science (LNCS), vol. 7700, pp. 599–619 (2012) 12. Kamada, S., Ichimura, T.: An adaptive learning method of deep belief network by layer generation algorithm. In: Proceedings of 2016 IEEE Region 10 Conference (TENCON), pp. 2971– 2974 (2016) 13. Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006) 14. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998) 15. Krizhevsky, A.: Learning multiple layers of features from Tiny images. Master of thesis, University of Toronto (2009) 16. Alzheimer’s Disease Neuroimaging Initiative: http://adni.loni.usc.edu/ (2021) 17. Liu, M., Cheng, D., Yan, W., et.al.: Classification of Alzheimer’s disease by combination of convolutional and recurrent neural networks using FDG-PET images. Front. Neuroinform. 12(35) (2018) 18. Liu, M., Cheng, D., Wang, K., et al.: Multi-modality cascaded convolutional neural networks for Alzheimer’s disease diagnosis. Neuroinformatics 16, 295–308 (2018) 19. Kavitha, M., Yudistira, N., Kurita, T.: Multi instance learning via deep CNN for multi-class recognition of Alzheimer’s disease. In: Proceedings of 2019 IEEE 11th International Workshop on Computational Intelligence and Applications (IWCIA), pp. 89–94 (2019) 20. Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002) 21. Ichimura, T., Tazaki, E., Yoshida, K.: Extraction of fuzzy rules using neural networks with structure level adaptation-verification to the diagnosis of hepatobiliary disorders. Int. J. Biomed. Comput. 40(2), 139–146 (1995)
Decision Making Theory for Economics
Calculations of SPCM by Several Methods for MDAHP Including Hierarchical Criteria Takao Ohya
Abstract We have proposed a super pairwise comparison matrix (SPCM) to express all pairwise comparisons in the evaluation process of the dominant analytic hierarchy process (D-AHP) or the multiple dominant AHP (MDAHP) as a single pair wise comparison matrix. This paper shows the calculations of SPCM with the logarithmic least squares method, the Harker method and the improved two-stage method for the multiple dominant AHP including hierarchical criteria. Keywords Super pairwise comparison matrix · Multiple dominant analytic hierarchy process · Logarithmic least squares method · Harker method · Improved two-stage method
1 Introduction In actual decision-making, a decision-maker often has a specific alternative (regulating alternative) in mind and makes an evaluation on the basis of the alternative. This was modeled in D-AHP (the dominant AHP), proposed by Kinoshita and Nakanishi [2]. If there are more than one regulating alternatives and the importance of each criterion is inconsistent, the overall evaluation value may differ for each regulating alternative. As a method of integrating the importance in such cases, the concurrent convergence method (CCM) was proposed. Kinoshita and Sekitani [3] showed the convergence of CCM. Ohya and Kinoshita [4] proposed an super pairwise comparison matrix (SPCM) to express all pairwise comparisons in the evaluation process of the D-AHP or the multiple dominant AHP (MDAHP) as a single pairwise comparison matrix. Ohya and Kinoshita [5] showed, by means of a numerical counterexample, that in MDAHP an evaluation value resulting from the application of the logarithmic least squares method (LLSM) to an SPCM does not necessarily coincide with that of T. Ohya (B) School of Science and Engineering, Kokushikan University, Tokyo, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_50
609
610
T. Ohya
the evaluation value resulting from the application of the geometric mean multiple dominant AHP (GMMDAHP) to the evaluation value obtained from each pairwise comparison matrix by using the geometric mean method. Ohya and Kinoshita [6] showed, using the error models, that in D-AHP an evaluation value resulting from the application of the logarithmic least squares method (LLSM) to an SPCM necessarily coincide with that of the evaluation value resulting obtained by using the geometric mean method to each pairwise comparison matrix. Ohya and Kinoshita [7] showed the treatment of hierarchical criteria in D-AHP with a super pairwise comparison matrix. SPCM of D-AHP or MDAHP is an incomplete pairwise comparison matrix. Therefore, the LLSM based on an error model or an eigenvalue method such as the Harker method [8] or two-stage method [9] is applicable to the calculation of evaluation values from an SPCM. Nishizawa proposed improved two-stage method (ITSM). Ohya and Kinoshita [10] and Ohya [11, 12] showed calculations of SPCM by each method applicable to an incomplete pairwise comparison matrix for the multiple dominant AHP including hierarchical criteria. This paper shows the calculations of SPCM by LLSM, the Harker method, ITSM for the multiple dominant AHP including hierarchical criteria.
2 SPCM The true absolute importance of alternative a(a = 1, . . . , A) at criterion c(c = 1, . . . , C) is vca . The final purpose of the AHP is toobtain the relative value between alternatives of the overall evaluation value va = Cc=1 vca of alternative a. The relative comparison values rcca a of importance vca of alternative a at criteria c compared with the importance vc a of alternative a in criterion c , are arranged in or (AC × AC) matrix. This is proposed as the SPCM R = × CA) caa (CA rc a or raac c . In a (CA changes first. In a (CA × CA) matrix, × CA) matrix, index of alternative SPCMs A(c − 1) + a, A c − 1 + a th element is rcca a . In a (AC changes first. In a (AC × AC) matrix, × AC) matrix, indexof criteria SPCMs C(a − 1) + c, C a − 1 + c th element is raac c . In an SPCM, symmetric components have a reciprocal relationship as in pairwise comparison matrices. Diagonal elements are 1 and the following relationships are true: c a exists and If rcca a exists, then rca
ca = 1/rcca a rca
(1)
ca rca =1
(2)
Calculations of SPCM by Several Methods for MDAHP …
611
SPCM of D-AHP or MDAHP is an incomplete pairwise comparison matrix. Therefore, the LLSM based on an error model or an eigenvalue method such as the Harker method [10] or two-stage method is applicable to the calculation of evaluation values from an SPCM.
3 Numerical Example of Using SPCM for Calculation of MDAHP Let us take as an example the hierarchy is shown in Fig. 1. Three alternatives from 1 to 3 and seven criteria from I to VI, and S are assumed, where Alternative 1 and Alternative 2 are the regulating alternatives. Criteria IV to VI are grouped as Criterion S, where Criterion IV and Criterion V are the regulating criteria. As the result of pairwise comparisons between alternatives at criterion c(c = I, . . . , VI), the following pairwise comparison matrices RcA , c = I, . . . , V I are obtained: ⎛ 1 ⎞ ⎛ ⎛ 1 1⎞ ⎞ 1 3 5 173 1 3 3 A = ⎝ 3 1 13 ⎠, RIA = ⎝ 3 1 3 ⎠, RIIA = ⎝ 17 1 13 ⎠, RIII 1 1 1 1 31 331 5 3 3 ⎛ ⎛ 1 ⎞ ⎛ 1 ⎞ ⎞ 135 1 3 3 1 5 3 A A = ⎝ 13 1 1 ⎠, RVA = ⎝ 3 1 5 ⎠, RVI = ⎝ 5 1 7 ⎠. RIV 1 1 1 1 1 11 1 1 5 3 5 3 7 With regulating Alternative 1 and Alternative 2 as the representative alternatives, and Criterion IV and Criterion V as the representative criteria, importance between criteria was evaluated by pairwise comparison. As a result, the following pairwise comparison matrices R1C , R1S , R2C , R2S are obtained: Fig. 1 Hieratical structure
Final goal
Alt. 1
Alt. 2
Alt. 3
612
T. Ohya
⎡
1 ⎢3 ⎢ ⎢ R1C = ⎢ 31 ⎢ ⎣3 5 ⎡ 1 ⎢1 ⎢5 ⎢ R2C = ⎢ 1 ⎢1 ⎣3 9
⎤ 3 13 51 ⎡ 1 ⎤ 1 5 1 21 ⎥ 1 2 2 ⎥ ⎥ S 1 1 1 ⎣ 1 5 9 ⎥, R1 = 2 1 5 ⎦, 5 ⎥ 1 1 1 5 1 21 ⎦ 1 2 5 2921 ⎤ 5 1 3 19 ⎡ 1 1⎤ 1 13 1 19 ⎥ 1 9 4 ⎥ ⎥ 3 1 1 19 ⎥, R2S = ⎣ 9 1 6 ⎦. ⎥ 1 1 1 19 ⎦ 4 16 1 9991 1 3
The (CA × CA) order SPCM for this example is I1
I2
I3 II1 II2 II3 III1 III2 III3 IV1 IV2 IV3 V1 V2 V3 VI1 VI2 VI3
I1 1 1/3 5 1/3 3 1/3 I2 3 1 3 5 1 I3 1/5 1/3 1 II1 3 1 7 3 5 1 II2 1/5 1/7 1 1/3 1/3 II3 1/3 3 1 III1 1/3 1/5 1 1/3 1/3 1/5 III2 1 3 3 1 1/3 III3 3 3 1 IV1 3 1 5 1 IV2 1/3 1 1 1/3 IV3 1/5 V1 5 2 9 2 V2 9 9 9 V3 VI1 1/2 VI2 VI3
1/5 3
1/9 1/2
1
1/9 1/9
1 3 1 1 9 4
1/9 5 1/2 1 1/9 1 1 1/3 3 3 1 5 1/3 1/5 1 1/5 1/6
2 1/4 5 6 1 1/5 3 5 1 7 1/3 1/7 1
4 Results of Calculation by LLSM For pairwise comparison values in an SPCM, an error model is assumed as follows: rcca a = εcca a Taking the logarithms of both sides gives
vca vc a
(3)
Calculations of SPCM by Several Methods for MDAHP …
613
log rcca a = log vca − log vc a + log εcca a
(4)
To simplify the equation, logarithms will be represented by overdots as. r˙cca a = log rcca a , v˙ca = log vca , ε˙ cca a = log εcca a . Using this notation, Eq. (4) becomes r˙cca a = v˙ca − v˙c a + ε˙ cca a , c, c = 1, . . . , C, a, a = 1, . . . , A
(5)
From Eqs. (1) and (2), we have
ca r˙cca a = −˙rca
(6)
ca r˙ca =0
(7)
If εcca a is assumed to follow an independent probability distribution of mean 0 and variance σ 2 , irrespective of c, a, c , a , the least squares estimate gives the best estimate for the error model of Eq. (5) according to the Gauss–Markov theorem. Equation (5) comes to following Eq. (8) by vector notation. ˙ = SP Y x + ε˙
(8)
where x˙ = (x˙I2 x˙I3 x˙II1 x˙II2 x˙II3 x˙III1 x˙III2 x˙III3 x˙IV1 · · · x˙VI2 x˙VI3 )T ⎡ ⎤ ⎤ ⎡ I1 −1 log(1/3) r˙I2 ⎢ ⎢ r˙ I1 ⎥ ⎢ log 5 ⎥ −1 ⎢ ⎥ ⎢ I3 ⎥ ⎢ ⎢ ⎢ r˙ I1 ⎥ ⎢ log(1/3) ⎥ ⎢ ⎥ ⎢ II1 ⎥ ⎢ ⎢ ⎥ ⎢ I1 ⎥ ⎢ ⎢ ⎢ r˙III1 ⎥ ⎢ log 3 ⎥ ⎢ ⎥ ⎢ I1 ⎥ ⎢ ⎢ ⎢ r˙IV1 ⎥ ⎢ log(1/3) ⎥ ⎢ ⎥ ⎢ I1 ⎥ ⎢ ⎢ ⎢ r˙V1 ⎥ ⎢ log(1/5) ⎥ ⎢ ⎥ ⎢ I2 ⎥ ⎢ ⎢ 1 −1 ⎢ r˙ ⎥ ⎢ log 3 ⎥ I3 ⎥ ⎢ ⎥ ⎢ ⎢ ˙ = ⎢ I2 ⎥ = ⎢ , S = Y ⎢ 1 ⎥ ⎢ ⎢ r˙II2 ⎥ ⎢ log 5 ⎥ ⎢ 1 ⎢ r˙ I2 ⎥ ⎢ log 1 ⎥ ⎢ ⎥ ⎢ III2 ⎥ ⎢ ⎢ ⎥ ⎢ I2 ⎥ ⎢ ⎢ 1 ⎢ r˙IV2 ⎥ ⎢ log 3 ⎥ ⎢ ⎥ ⎢ I2 ⎥ ⎢ ⎢ 1 ⎢ r˙V2 ⎥ ⎢ log(1/9) ⎥ ⎢ ⎥ ⎢ II1 ⎥ ⎢ ⎢ ⎢ r˙II2 ⎥ ⎢ log 7 ⎥ ⎢ ⎥ ⎥ ⎢ ⎢ ⎢ .. .. ⎥ ⎢ .. ⎥ ⎢ .. ⎣ . . ⎦ ⎣ . ⎦ ⎣ . VI2 log 7 r˙VI3 ⎡
−1 −1 −1
−1 −1
1 −1 .. .. .. .. . . . .
.. .. .. . . .
⎤ ··· ⎥ ··· ⎥ ⎥ ··· ⎥ ⎥ ··· ⎥ ⎥ ⎥ ··· ⎥ ⎥ ··· ⎥ ⎥ ··· ⎥ ⎥ ··· ⎥ ⎥ ··· ⎥ ⎥ ··· ⎥ ⎥ ⎥ ··· ⎥ ⎥ ··· ⎥ . . .. .. ⎥ . . . ⎦ 1 −1
To simplify calculations, v11 = 1, that is v˙11 = 0. The least squares estimates for −1 ˙ formula (8) are calculated by x˙ = ST S ST Y.
614
T. Ohya
Table 1 Evaluation values obtained by SPCM + LLSM Criterion
I
II
III
IV
V
VI
Overall evaluation value
Alternative 1
1
2.859
0.491
2.536
4.765
1.071
12.723
Alternative 2
1.616
0.492
1.113
0.702
9.403
2.152
15.479
Alternative 3
0.328
1.186
2.219
0.597
1.728
0.506
6.564
Table 1 shows the evaluation values obtained from the SPCM for this example.
5 Results of Calculation by the Harker Method In the Harker method, the value of a diagonal element is set to the number of missing entries in the row plus 1 and then evaluation values are obtained by the usual eigenvalue method. The SPCM by the Harker method for this example is I1
I2
I3 II1 II2 II3 III1 III2 III3 IV1 IV2 IV3 V1 V2 V3 VI1 VI2 VI3
I1 12 1/3 5 1/3 3 1/3 1/5 I2 3 12 3 5 1 3 1/9 I3 1/5 1/3 16 II1 3 12 7 3 5 1 1/2 II2 1/5 1/7 12 1/3 1/3 1 1/9 II3 1/3 3 16 III1 1/3 1/5 12 1/3 1/3 1/5 1/9 III2 1 3 3 12 1/3 1 1/9 III3 3 3 16 IV1 3 1 5 11 3 5 1/2 2 IV2 1/3 1 1 1/3 11 1 1/9 1/4 IV3 1/5 1 16 V1 5 2 9 2 11 1/3 3 5 V2 9 9 9 9 3 11 5 6 V3 1/3 1/5 16 VI1 1/2 1/5 14 1/5 3 VI2 4 1/6 5 14 7 VI3 1/3 1/7 16
Table 2 shows the evaluation values obtained from the SPCM by the Harker method for this example.
Calculations of SPCM by Several Methods for MDAHP …
615
Table 2 Evaluation values obtained by SPCM + the Harker method Criterion
I
II
III
IV
V
VI
Overall evaluation value
Alternative 1
1
2.673
0.466
2.311
4.350
0.941
11.740
Alternative 2
1.756
0.520
1.152
0.705
9.620
2.011
15.764
Alternative 3
0.348
1.087
2.152
0.518
1.496
0.436
6.038
6 Results of Calculation by ITSM The ij element of comparison matrix A is denoted by aij for i, j = 1 . . . n. Nishizawa [12] proposed the following estimation method ITSM. For unknown aij : aij =
n
1/m aik akj
,
(3)
k=1
where m is the number of known aik akj , i = 1 . . . n. If unknown comparisons in factors of aik akj are included, then assume aik akj = 1. If m = 0, the estimated aij by (3) is treated as known comparison, and the aij with m = 0 in (3) is treated as unknown in the next level. Repeat above procedure until unknown elements completely estimated. The complete SPCM T by ITSM for this example is
I1 I2 I3 II1 II2 II3 III1 III2 III3 IV1 IV2 IV3 V1 V2 V3 VI1 VI2 VI3
I1
I2
1 3 0.20 3 0.51 1 0.33 1.73 1 3 1 0.6 5 20.1 1.67 1.22 2.90 0.63
0.33 1 0.33 1.18 0.20 0.60 0.19 1 3 1 0.33 0.33 2.24 9 1.80 0.54 1.41 0.30
I3 II1 II2 II3 III1 III2 III3 IV1 IV2 IV3 V1 V2 V3 VI1 VI2 VI3 5 3 1 15 0.60 3.61 1.67 3 6.76 15 1 1.90 25 27 5.04 3.86 5.87 1.70
0.33 0.85 0.07 1 0.14 0.33 0.20 0.51 0.60 1 0.22 0.20 2 2.78 0.67 0.45 0.85 0.17
1.97 5 1.67 7 1 3 1.18 3 9 4.58 1 1 6.48 9 1.8 1.58 2.45 0.71
1 1.67 0.28 3 0.33 1 0.6 1 1.87 3 0.33 0.53 6 3 1.40 0.85 1.72 0.35
3 5.20 0.60 5 0.85 1.67 1 3 3 5 2.24 1 9 27 3 2.12 5.01 1.10
0.58 1 0.33 1.97 0.33 1 0.33 1 3 2.24 1 1 3 9 1.8 0.93 2.45 0.53
1 0.33 0.15 1.67 0.11 0.53 0.33 0.33 1 1.67 0.33 0.28 3 3 0.75 0.60 0.83 0.25
0.33 1 0.07 1 0.22 0.33 0.20 0.45 0.6 1 0.33 0.20 2 4.24 0.67 0.50 1.41 0.17
1 3 1 4.58 1 3 0.45 1 3 3 1 1 4.24 9 1.80 1.41 4 1.33
1.67 3 0.53 5 1 1.90 1 1 3.56 5 1 1 10 9 2.90 2.50 4 0.86
0.20 0.45 0.04 0.50 0.15 0.17 0.11 0.33 0.33 0.50 0.24 0.10 1 3 0.33 0.20 0.55 0.07
0.05 0.11 0.04 0.36 0.11 0.33 0.04 0.11 0.33 0.24 0.11 0.11 0.33 1 0.20 0.06 0.17 0.06
0.60 0.56 0.20 1.50 0.56 0.72 0.33 0.56 1.34 1.50 0.56 0.35 3 5 1 0.60 0.83 0.29
0.82 1.85 0.26 2.24 0.63 1.17 0.47 1.07 1.67 2.00 0.71 0.40 5 16.4 1.67 1 3 0.33
0.34 0.71 0.17 1.18 0.41 0.58 0.20 0.41 1.21 0.71 0.25 0.25 1.83 6 1.20 0.33 1 0.33
1.58 3.29 0.59 5.83 1.41 2.87 0.91 1.90 3.95 6 0.75 1.16 15 18 3.47 3 3 1
Table 3 shows the evaluation values obtained from the SPCM by ITSM for this example.
616
T. Ohya
Table 3 Evaluation values obtained by SPCM + ITSM Criterion
I
II
III
IV
VI
Overall evaluation value
Alternative 1
1
2.946
0.495
2.681
V 5.048
1.075
13.245
Alternative 2
1.800
0.564
1.152
0.731
10.110
2.266
16.624
Alternative 3
0.359
1.283
2.388
0.618
1.762
0.520
6.930
7 Conclusion SPCM of MDAHP is an incomplete pairwise comparison matrix. Therefore, the LLSM based on an error model or an eigenvalue method such as the Harker method or two-stage method is applicable to the calculation of evaluation values from an SPCM. This paper shows the calculations of SPCM by LLSM, the Harker method, ITSM for the multiple dominant AHP including hierarchical criteria.
References 1. Saaty, T.L.: The analytic hierarchy process. McGraw-Hill, New York (1980) 2. Kinoshita, E., Nakanishi, M.: Proposal of new AHP model in light of dominative relationship among alternatives. J. Oper. Res. Soc. Jpn. 42, 180–198 (1999) 3. Kinoshita, E., Sekitani, K., Shi, J.: Mathematical properties of dominant AHP and concurrent convergence method. J. Oper. Res. Soc. Jpn. 45, 198–213 (2002) 4. Ohya, T., Kinoshita, E.: Proposal of super pairwise comparison matrix. In: Watada, J., et al. (eds.) Intelligent Decision Technologies, pp. 247–254. Springer, Berlin Heidelberg (2011) 5. Ohya, T., Kinoshita, E.: Super pairwise comparison matrix in the multiple dominant AHP. In: Watada, J., et al. (eds.) Intelligent Decision Technologies. Smart Innovation, Systems and Technologies 15, vol. 1, pp. 319–327. Springer, Berlin Heidelberg (2012) 6. Ohya, T., Kinoshita, E.: Super pairwise comparison matrix with the logarithmic least squares method. In: Neves-Silva, R. et al. (eds.) Intelligent Decision Technologies, vol. 255 Frontiers in Artificial Intelligence and Applications, pp. 390–398. IOS press (2013) 7. Ohya, T., Kinoshita, E.: The treatment of hierarchical criteria in dominant AHP with super pairwise comparison matrix. In: Neves-Silva, R. et al. (eds.) Smart Digital Futures 2014, pp. 142–148. IOS press (2014) 8. Harker, P.T.: Incomplete pairwise comparisons in the analytic hierarchy process. Math. Model. 9, 837–848 (1987) 9. Nishizawa, K.: Estimation of unknown comparisons in incomplete AHP and it’s compensation. Report of the Research Institute of Industrial Technology, Nihon University Number 77, 10 pp. (2004) 10. Ohya, T., Kinoshita, E.: Super Pairwise Comparison Matrix in the multiple dominant AHP with Hierarchical Criteria. In Czarnowski, I. et al. (eds.) KES-IDT 2018, SIST 97, pp. 166–172. Springer International Publishing AG (2019) 11. Ohya, T.: SPCM with Harker method for MDAHP including hierarchical criteria. In: Czarnowski, I. et al. (eds.) Intelligent Decision Technologies 2019, SIST 143, pp. 277–283. Springer International Publishing AG (2019) 12. Ohya, T.: SPCM with improved two stage method for MDAHP including hierarchical criteria. In: Czarnowski, I. et al. (eds.) Intelligent Decision Technologies 2020, SIST 193, pp. 517–523. Springer International Publishing AG (2020)
Equilibria Between Two Sets of Pairwise Comparisons as Solutions of Decision-Making with Orthogonal Criteria Takafumi Mizuno
Abstract This paper deals with a type of decision-making that pairwise comparisons specify which alternative is better concerning two criteria and that chosen alternative selects one of the criteria. Solutions to the decision-making are regarded as equilibria of a normal form game of two players. The equilibria are sought by drawing directed graphs that consist of nodes, arrows, and undirected edges. Each node of the graphs is a pair of a criterion and an alternative. An arrow represents which one prefers to another between two nodes. Weights, how better nodes are, can be put on arrows. An undirected edge connects indifferent two nodes. Throughout two examples of decision-making in this paper, how to seek the equilibria, pure and mixed, from the graphs are explained. In one of them, votes create weights on the arrows of the graphs. Keywords Pairwise comparison · Nash equilibrium · Voting
1 Decision-Making that Alternatives Choose Criteria Decision-making is a process that decision-makers choose alternatives concerning some criteria. We often face a type of decision-making that alternatives select criteria. Let us consider an example of a campaign of an election. A political party chooses one of three candidates X , Y , and Z belonging to the party and helps the candidate win the election. In the election, candidates appeal policies to enlarge the economy (economy) or policies to improve general welfare (welfare). By making pairwise comparisons, the party thinks which candidate is the best at the two criteria: “economy” and “welfare.” At “economy,” X is better than both Y and Z , and Y is better than Z , that is, X Y Z . While at “welfare,” Y is better than X , Z is better than Y , and X is better than Z , that is, X Z Y X . A cycle of preference occurs. Figure 1a depicts the preferences. An arrow X ← Y means that X prefers to Y . T. Mizuno (B) Meijo University, 4-102-9 Yada-Minami, Higashi-ku, Nagoya-shi, Aichi, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_51
617
618
T. Mizuno
Fig. 1 Arrows in (a) represent which candidate is desirable regard with “economy” and “welfare.” Arrows in (b) represent which themes, “economy” or “welfare,” each candidate prefers Fig. 2 A hierarchical structure
By the way, the candidate X likes to debate economic issues, and X ’s speech about “economy” can obtain more support of people than about “welfare.” Therefore, if the party chooses X to pour one’s energy, the campaign must focus mainly on “economy.” Similarly, Y ’s favorite is “welfare,” and Z ’s favorite is “economy.” Fig. 1b depicts the preferences. The decision-making can be represented in the three-layered structure as Fig. 2. The top layer represents the goal of the decision-making, that is, choosing the best candidate. The second layer represents two criteria: “economy” and “welfare.” The third layer represents three alternatives: X , Y , and Z . The word “OR” emphasizes that the decision-makers select whether “economy” or “welfare” as the criterion to make the chosen candidate win. So, how does the party choose the candidate? Focusing on “welfare” cannot narrow candidates because of the cycle of preference. Choosing X or Z makes the party focus on “economy,” and choosing Y makes the party focus on “welfare.” While if the party focuses on “economy,” then X is the best. If X is chosen, the party focuses on the “economy.” In the example, choosing X and focusing on “economy” seems to be valid. In this paper, I propose a way to solve such decision-making by drawing directed graphs. Merging Fig. 1a, b makes a directed graph in Fig. 3. Nodes named X e, Y e, Z e, X w, Y w, and Z w represent decision outcomes. For example, the node X e means that the party chooses X and focuses on “economy,” and the node Z w means that
Equilibria Between Two Sets of Pairwise Comparisons as Solutions . . .
619
Fig. 3 A directed graph
the party chooses Z and focuses on “welfare.” Each arrow represents which node is preferable. The party should choose the node that has no outgoing arrow if such node exists. It is a solution to the decision-making. In the example, that is X e.
2 As a Game The solution’s concept can be considered an extension of the Nash equilibrium of normal form games of two players. The above example is regarded as a game of player 1 and player 2. The player 1 chooses one from strategies X , Y , and Z . While the player 2 chooses one from strategies “economy” and “welfare.” Each player has to choose the best strategy against the opponent’s strategy. A pair of strategies of the players is called an outcome. Nash equilibrium of the game is an outcome that consists of their best strategies against their opponent’s best. Let S1 and S2 be sets of strategies of both the players, that is, S1 = {X, Y, Z } and S2 = {“economy”, “welfare”}. An outcome (s1 , s2 ), s1 ∈ S1 and s2 ∈ S2 , is a Nash equilibrium if there are not strategies s1 ∈ S1 and s2 ∈ S2 such that (s1 , s2 ) (s1 , s2 ) for player 1, (s1 , s2 ) (s1 , s2 ) for player 2. The equilibrium corresponds to a node without outgoing arrows in the directed graph. A difference between our game and general normal form games is a way to specify which outcome is better. General normal form games put payoffs to all outcomes, and comparing the values specifies which outcome is better. But our game specifies it by pairwise comparisons represented in arrows. If an outcome is the best for all players, the corresponding node has no outgoing arrow.
620
T. Mizuno
3 Seeking Mixed Equilibria The above equilibrium, X e, is called pure. Generally, normal form games may not have pure Nash equilibria. Nash, however, pointed out that if players are allowed to make random choices, the games must have at least one equilibrium [1, 2]. The equilibria, which consist of pair of probabilities over the player’s strategies, are called mixed. We also can consider the mixed equilibria of our game. Before it, I notice that our game may not have mixed equilibria because each player’s set of strategies may not have total order. Let us modify the directed graph to a weighted directed graph such as Fig. 4. Each arrow has a weight representing how large impressions the candidates will give to people at the campaign. For example, when the campaign focuses on the “economy,” people take better impressions from X than from Z , and the estimated difference of the impressions is 30. The candidate Y can give people better impressions when the party focuses on “welfare” than on “economy,” and the estimated difference of the impressions is 20. And also, the party selects “economy” or “welfare” randomly with the probability p. Fig. 5a corresponds to p = 1, and (d) corresponds to p = 0. They mean that the party focuses only on “economy,” and only on “welfare,” respectively. Changing the probability gradually from p = 0 to p = 1 changes the graphs for choosing candidates as from Fig. 5d via (c) via (b) to (a). In this paper, the weights on the graph are changed linearly by changing the probability. It corresponds to calculating expected payoffs in general normal form games. If an arrow from a node j to a node i has a weight w1 in p = 1, and an arrow between the two nodes has a weight w0 in p = 0, and the two arrows have the same direction, then the composite arrow in any p also has the same direction and a weight w = w1 p + w0 (1 − p) (Fig. 6a). While, if the arrow in p = 1 and the arrow in p = 0 have different directions, then the composite arrow in any p has a weight w = w1 p − w0 (1 − p) (Fig. 6b). When the weight is less than zero, the direction of the composite arrow turns in the opposite.
Fig. 4 A weighted directed graph
Equilibria Between Two Sets of Pairwise Comparisons as Solutions . . .
621
Fig. 5 Changing the probability p changes arrows’ directions
Fig. 6 Composite arrow and its weight in any p when the directions of arrows in p = 1 and in p = 0 are same (a), and when the directions are different (b)
Fig. 7 Probability in which an undirected edge appears
Now, I introduce an undirected edge that represents zero weight. The undirected edge appears when directions of arrows in p = 1 and in p = 0 are different. The undirected edge appears in p = w0 /(w1 + w0 ) (Fig. 7). It is derived from w1 p − w0 (1 − p) = 0. When two nodes i and j are connected by an undirected edge, the node i is not inferior to the node j, and vice versa. If undirected edges connect nodes with outgoing arrows, then random choices of the nodes are not equilibria. To emphasize it, undirected edges that connect nodes with outgoing arrows are represented as dashed in the graph (Fig. 8b, c). They are ignored when finding equilibria.
622
T. Mizuno
Fig. 8 Equilibria may be on an undirected edge connecting nodes without outgoing arrow (a). Ignore dashed undirected edges connecting nodes with outgoing arrows when seeking mixed equilibria (b, c)
Fig. 9 A mixed equilibrium
In our example, equilibria may exist in Fig. 5b, that is, p = 2/3. The candidate X prefers “economy” to “welfare,” Y prefers “welfare” to “economy,” and the weights are 10 and 20, respectively. Equilibrium is a random choice between the node X and Y of Fig. 9a. Let q X , qY , and q Z , where q X + qY + q Z = 1 and q X , qY , q Z ≥ 0, be probabilities of choosing X , Y , and Z , respectively. By considering similarly above, an undirected edge appears when q X = 20/(10 + 20) = 2/3, qY = 1/3, and q Z = 0 (Fig. 9b). The composite node representing the random choice with probabilities p = 2/3 and q = (q X , qY , q Z ) = (2/3, 1/3, 0) does not have outgoing arrows. That is the mixed equilibrium.
4 An Example: Voting and Pairwise Comparisons In the previous example, I give weights of arrows without profound thoughts. In this section, I provide a way how to put weights by votes. Let us consider a new example. A government has two committees, A and B, and each committee has twelve members. Both the committees want secretaries who deal with some of their tasks, but the government can employ only one secretary by budget constraint. That is, one committee will miss taking secretary. Three secretaries X , Y , and Z apply for the tasks. To choose one from them, both the committees do votes. At the votes, each member shows who is the best, the second, and the worst.
Equilibria Between Two Sets of Pairwise Comparisons as Solutions . . .
623
Table 1 Result of votes of committee A (a) profile 5 voters 3 voters 4 voters 1st Z X Y 2nd Y Z X 3rd X Y Z
(b) standings X Y Z X Y Z
9 5
3 8
7 4 -
Table 2 Result of votes of committee B (a) profile 3 voters 2 voters 2 voters 5 voters 1st Z Y Z X 2nd Y Z X Y 3rd X X Y Z
(b) standings X Y Z X Y Z
5 7
7 5
5 7 -
Fig. 10 A graph represents the example of a decision-making choosing secretary
The result of the votes of committees A and B is in Tables 1a and 2a, respectively. The rightmost column in Table 1a means that four members of committee A think that Y is the best, X is the second, and Z is the worst. To consider who is better in the case, we can make pairwise comparisons. By regarding X and Y in the middle column of Table 1a, three members put X higher order than Y . In the other columns, the remaining nine members put Y higher than X . Standings made by such pairwise comparisons of committee A are in Table 1b. A value in row of i and column of j represents the number of members who think that i prefers to j. And standings made by votes of committee B are in Table 2b. The government’s decision-making goal is to choose a secretary with respecting the votes and arrange the chosen secretary into committee A or B. The graph in Fig. 10 represents the decision-making. In committee A, since the number of members who think X Y and Y X are three and nine, respectively, an arrow is drawn from X to Y and its weight is 6 = 9 − 3.
624
T. Mizuno
Fig. 11 Changing the probability p changes arrows’ directions Fig. 12 Mixed equilibrium
Three undirected edges connect from X A to X B, Y A to Y B, and Z A to Z B by assuming that secretaries do not know the difference between the two committees. The chosen secretary exerts the same performance when works in whichever committee. So, let us seek equilibria of the decision-making. At first, we can easily see that there is no pure equilibrium because all nodes have one outgoing arrow. Next, mixed equilibria are found by drawing the directed graph. Like the previous example of the election campaign, the probability p of choice of the committee is changed gradually as Fig. 11. The secretary will work in committee A when p = 1 and in committee B when p = 0. An undirected edge that connects node without outgoing arrows appears when p = 1/3. And, to find equilibria on the undirected edge, the probability vector q of choice of the secretaries as Fig. 12, where q = (q X , qY , q Z ), q X + qY + q Z = 1, q X , qY , q Z ≥ 0. The probability q X is zero because X inferiors to Y and Z when p = 1/3. By the assumption that secretaries’ do not choose committees, any composite nodes on it have only undirected edges; they have no outgoing arrow, and all of them are equilibria (Fig. 12). The mixed equilibria are ( p, q) = (1/3, (0, qY , q Z )), where qY + q Z = 1, and qY , q Z ≥ 0.
Equilibria Between Two Sets of Pairwise Comparisons as Solutions . . .
625
5 Discussions This paper provided the way to find equilibria as solutions to a type of decisionmaking. In this paper, I interpreted that mixed equilibria are pairs of the probabilities of random choices. At the example of the election campaign, the mixed equilibrium was ( p, q) = (2/3, (2/3, 1/3, 0)). When the equilibrium is adopted as a solution, the political party pushes candidate X and Y randomly at the rate of q X : qY = 2/3 : 1/3. And the theme of the campaign is switched “economy” and “welfare” randomly at the rate of p : (1 − p) = 2/3 : 1/3. It can be possible when the party travels in many cities with the candidates X and Y throughout the election campaign. In the example of choosing a secretary, if it is possible, the government rearranges daily the chosen secretary into the committees A and B alternatively in the ratio of 1–3. The government will choose the secretary Y or Z or will switch them with any probability. There is an important notice that solutions to our decision-making may not exist. Game theory tells us that every general normal form game must have at least one equilibrium. While decision-making dealt with in this paper gives preferences among alternatives by making pairwise comparisons. Guaranteeing total order among alternatives goes away, and equilibria may not exist. To find mixed equilibria, we calculate the probability that an arrow becomes an undirected edge. Since it is done for all arrows, the computational cost is proportional to the number of arrows, that is, (n 2 − n)/2, where n is the number of alternatives. If equilibria exist, my approach can find all equilibria. In the decision-making in this paper, alternatives selected one of two criteria, and remained criteria are ignored. Generally, decision-making processes do not ignore any criteria. Analytic hierarchy process (AHP) proposed by Saaty [3] is such process. All criteria are weighted by pairwise comparisons concerning with the goal. Into AHP, Kinoshita and Nakanishi [4, 5] introduced a mechanism that alternatives evaluate weights of criteria. The bottom-up evaluations of AHP are called dominant AHP. By comparing with dominant AHP, we can say that the decision-making dealt with this paper extended its three-layered hierarchical structure from “AND” evaluations to “OR” evaluations. In our games, one side of players has only two strategies. You may think that it is few. However, we often face decision-making to decide whether to do a particular action or do not, choose which plan A or plan B, etc. Situations applicable the way of this paper may be not few much.
6 Conclusions In this paper, decision-making that alternatives select criteria were introduced. They were regarded as normal form games. Finding solutions to the decision-makings was seeking equilibria of the games. The equilibria were found by drawing directed graphs. Using directed graphs to seek pure equilibria for normal form games is trivial.
626
T. Mizuno
Fig. 13 A decision-making that does not have equilibria
But seeking mixed equilibria, however, is not. Mizuno [6] mentioned that directed graphs can used to seek equilibria of normal form games of two players. Seeking equilibria of games tends to complicate. This paper provides a visualization that simplifies it. In comparison with general normal form games, the unique point of games in this paper is that pairwise comparisons specify which alternative is better. Because of this point, I mentioned that the decision-making might not have solutions. Decisionmaking represented in Fig. 13 does not have equilibria pure nor mixed. My future work is providing new concepts of solutions when equilibria do not exist.
References 1. Nisan, N., Roughgarden, T., Tardos, E., Vazirani, V.V.: Algorithmic Game Theory. Cambridge University Press (2007) 2. Nash, J.: Equilibrium points in n-person games. Proc. Natl. Acad. Sci. 36(1), 48–49 (1950) 3. Saaty, T.L.: The Analytic Hierarchy Process. McGraw Hill, New York (1980) 4. Kinoshita, E., Nakanishi, M.: Proposal of new AHP model in light of dominant relationship among alternatives. J. Oper. Res. Soc. Jpn. 42(2), 180–197 (1999) 5. Kinoshita, E., Nakanishi, M.: A proposal of CCM as a processing technique for additional data in the dominant AHP. J. Infrastructure Planning Manage. IV-42-611, 13–19 (1999) 6. Mizuno, T.: A diagram for finding nash equilibria in two-players strategic form games. In: Czarnowski, I., Howlett, R., Jain, L. (eds.) Intelligent Decision Technologies, IDT 2020, Smart Innovation, Systems and Technologies, vol. 193. Springer, Singapore (2020)
Fluctuations in Evaluations with Multi-branch Tree Method for Efficient Resource Allocation Natsumi Oyamaguchi
Abstract In this paper, we investigate the influence of fluctuations in the evaluation phase of allocating resources with the multi-branch tree method, which is a novel resource allocation method proposed in our previous paper. By considering fluctuation functions and their upper bound functions, we will discuss the influence on the final allocation ratio by the difference between levels in which evaluators with fluctuating values are set. Keywords Fluctuations in evaluations · Resource allocations · Budget conflicts · Multi-branch tree method
1 Introduction Decision-making for optimal resource allocation is quite important in economics [1– 3]. We proposed a novel resource allocation method called the multi-branch tree method, two models of which appeared in previous papers as the tournament model in [4] and generalized model in [5]. This method can balance between minimizing assessment costs and improving allocation efficiency. However, we should consider the influence of fluctuations in the evaluation phase when we adopt this method for practical use because there is no guarantee that all evaluators will give projects the appropriate evaluation values. Note that there are two types of fluctuation that occur during evaluations. One is fluctuation caused by the various types of human relationships or by preconceived ideas, which may be inherent to evaluation itself. This type of fluctuation will always accompany evaluations even if we make efforts to account for it carefully. The other is fluctuation caused by evaluators because they cannot give accurate evaluation values among representative projects that reflect the real values. In general, evaluators have to compare and evaluate projects in different fields, and being familiar with every field is quite difficult. N. Oyamaguchi (B) Shumei University, Chiba, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_52
627
628
N. Oyamaguchi
In this paper, we investigate the influence of fluctuations during evaluations on the final allocation ratio. In particular, we will examine the difference between levels in which evaluators with fluctuating values are set. The rest of this paper is organized as follows. We prepare notations and review the multi-branch tree method in Sect. 2. Next, we define a fluctuation function that reflects the influence of an evaluator’s fluctuating value in Sect. 3. A numeric example is provided in Sect. 4. We then discuss the difference between levels in which evaluators with fluctuating values are set in Sect. 5.
2 Notation The multi-branch tree method is a novel resource allocation method that uses a multibranch tree to indicate a hierarchical structure. In this study, we will concentrate on the tournament model for simplicity. See details in [4, 5]. Let us consider a multi-branch tree T (n, k) that has n (∈ N) hierarchical levels, and each node has k (∈ N) child nodes. Each leaf in the tree represents a project, and a node that is not a leaf represents an evaluator. We call the top level of this tree level 1 and the lowest level n as in Fig. 1. For each evaluator, we number k edges that are connected to its child nodes 1 to k from left to right. Let us call the evaluator in level 1 E (0) and call the evaluators in level 2 that are connected with E (0) through edges a1 (a1 ∈ {1, . . . , k}) E (0,a1 ) . Similarly, we call the evaluators in level 3 that are connected with E (0,a1 ) through edges a2 (a2 ∈ {1, . . . , k}) E (0,a1 ,a2 ) . By repeating this procedure, we call the evaluators in level i + 2 that are connected with evaluator E (0,a1 ,...,ai ) (a1 , . . . , ai ∈ {1, . . . , k}) through edges ai+1 (ai+1 ∈ {1, . . . , k}) E (0,a1 ,...,ai ,ai+1 ) . We call the evaluator in level 1, from level 2 to n − 2, and level n − 1 the top level evaluator, middle level evaluators, and bottom level evaluators, respectively. Note that the number of evaluators in level j is k j−1 . Then, the leaves that are the child nodes of the bottom level evaluator E (0,a1 ,...,an−2 ) (a1 , . . . , an−2 ∈ {1, . . . , k}) are all projects. We call the project that is connected to E (0,a1 ,...,an−2 ) through the edge an−1 (an−1 ∈ {1, . . . , k}) P(0,a1 ,...,an−2 ,an−1 ) . Note that there are k n−1 projects in level n. Let us denote the set of all these projects by P and the set of these projects’ subscripts by Psub . Here, we give an order to Psub as follows. (0, 1, 1, · · · , 1, 1, 1) < (0, 1, 1, · · · , 1, 1, 2) < · · · < (0, 1, 1, · · · , 1, 1, k) < (0, 1, 1, · · · , 1, 2, 1) < · · · < (0, 1, 1, · · · , 1, 2, k) < (0, 1, 1, · · · , 1, 3, 1) < · · · < (0, 1, 1, · · · , 1, k, k) < (0, 1, 1, · · · , 2, 1, 1) < · · · < (0, k, k, · · · , k, k, k) · · · (∗) Let us also denote the set of all evaluators by E and the set of these evaluators’ subscripts by Esub .
Fluctuations in Evaluations with Multi-branch Tree Method . . .
629
Fig. 1 Multi-branch tree T (n, k) Fig. 2 Path x
Each evaluator should select a project from its child projects. Note that every bottom level evaluator has just one project. Therefore, a bottom level evaluator selects the project that is its only child node. A middle level evaluator should select one project from among k projects that were selected by the evaluator’s child evaluators. Each evaluator is prohibited from assessing them before all evaluations are done by its children evaluators. We can set this representative project as that with the smallest subscript among k projects without loss of generality. Repeating this procedure, we have the structure shown in Fig. 1. Note that the label of the edge that connects E (0,a1 ,...,ai ) and E (0,a1 ,...,ai ,ai+1 ) is the representative project P(0,a1 ,...,ai ,ai+1 ,1,...,1) , which was selected by evaluator E (0,a1 ,...,ai ) from among the projects that are selected by its child nodes E (0,a1 ,...,ai ,ai+1 ) for a1 , . . . , ai+1 ∈ {1, . . . , k}. Next, we define maps to have the final allocation ratio W for all projects. For any x ∈ Psub , we obtain only one shortest path from E (0) to Px and call it path x. Here, we define a map, p : Psub × Esub −→ Psub , as p(x, y) = z, where E y is in path x, and Pz is chosen by E y as a representative project. See Fig. 2.
630
N. Oyamaguchi
Fig. 3 Path (0, a1 , . . . , an−1 )
It is easy to see that path (0, a1 , . . . , an−1 ) for an arbitrary project P(0,a1 ,...,an−1 ) for a1 , . . . , an−1 ∈ {1, . . . , k} is as in Fig. 3. We define a map, l : Psub × Esub −→ Z, as l(x, y) = h, where h means the length of the path from E y to Px . In contrast, we define a map, e : Psub × Z −→ Esub , as e(x, h) = y, which outputs the subscript of E y , where h means the length of the path from E y to Px . We also define a map, v : Esub × Psub −→ Z, where v(y, z) means the evaluation value that E y gives Pz . According to our previous paper, the final allocation ratio W for k n−1 projects is obtained as the following k n−1 -dimensional vector. W = w(P(∗1 ) ), . . . , w(P(∗k n−1 ) ) ,
Fluctuations in Evaluations with Multi-branch Tree Method . . .
where
631
w (P(∗ ) ) w(P(∗i ) ) = k n−1 i , w (P(∗i ) ) i=1
w (P(∗i ) ) = v (0), p((∗i ), l((∗i ), 0))
i ),0)−1 l((∗
h=1
v e((∗i ), h), p((∗i ), h) v e((∗i ), h), p((∗i ), h + 1) ,
and (∗i ) means the subscript of the ith project in the row (∗). Note that w(P(∗i ) ) = 1.
(∗i )∈Psub
3 Fluctuation Functions In this section, we will define the fluctuation function, which reflects the influence of fluctuations that occur when an evaluator in each level gives a fluctuated value to the same representative project. We can set the representative project to P(∗1 ) without loss of generality. Note that P(∗1 ) is evaluated by evaluators at all levels. We should compare the influence of each fluctuation on P(∗1 ) by an evaluator at each level like E (0) , E (0,1) , …, E (0,1,...,1) , where each of n − 1 evaluators is the leftmost evaluator at each level. Let α be the range of fluctuation for P(∗1 ) , which is given by each evaluator. When each evaluator E ( ∈ {(0), (0, 1), . . . , (0, 1, . . . , 1)}) gives P(∗1 ) a fluctuation, the value v(, (∗1 )) is fluctuated to the value αv(, (∗1 )). Note that this means upward fluctuation to P(∗1 ) if α > 1 and downward fluctuation to P(∗1 ) if 0 < α < 1. Given a fluctuation, we denote the fluctuated value of P(∗i ) by w (l) (P(∗i ) ), where l k n−1 k n−1 w (P(∗i ) ) and S (l) = w (l) (P(∗i ) ). It is level of the evaluator E . Let S = i=1
i=1
is easy to see that the relative weight of P(∗i ) among k n−1 projects, which is influenced by E in level l, is: w (l) (P(∗i ) ) w (l) (P(∗i ) ) = . S (l) For each l (1 ≤ l ≤ n − 1), the fluctuation function of level l is defined by
y(l) (α) =
2 n−1 k w(P(∗i ) ) − w (l) (P(∗i ) ) i=1 w(P(∗ ) ) i
k n−1
.
Here, we note that, if the variation between a fluctuated value and the original value is greater, the value of the fluctuation function becomes greater.
632
N. Oyamaguchi
Theorem 1 For a range of fluctuation α and a level l ( 1 ≤ l ≤ n − 1) in which an evaluator with a fluctuating value is set, the fluctuation function y(l) (α) is obtained as follows. ⎧ n−2 2 n−1 n−2 2 ⎪ ⎪ ⎪ k (1 − αβ) + (kn−1 − k )(1 − β) ⎪ ⎪ k ⎪ ⎪ ⎪ ⎪ if l = 1 ⎨ y(l) (α) = (k n−l − k n−1−l )(1 − β )2 + (k n−1 − k n−l + k n−1−l )(1 − β)2 ⎪ ⎪ ⎪ ⎪ α ⎪ ⎪ ⎪ k n−1 ⎪ ⎪ ⎩ if 2 ≤ l ≤ n − 1, where
k n−1
β=
w (P(∗i ) )
S i=1 . = n−1 k S (l) (l) w (P(∗i ) ) i=1
Proof For l = 1, we have
w
(1)
αw (P(0,a1 ,...,an−1 ) ) if a1 = 1 (P(0,a1 ,...,an−1 ) ) = w (P(0,a1 ,...,an−1 ) ) otherwise,
where a1 , . . . , an−1 ∈ {1, . . . , k}. It follows that
y(1) (α) =
= =
2 n−1 k w(P(∗i ) ) − w (1) (P(∗i ) ) i=1 w(P(∗ ) ) i
k n−1
2 n−1 k w (P(∗i ) ) − βw (1) (P(∗i ) ) i=1 w (P(∗ ) ) i
k n−1 k n−2 (1 − αβ)2 + (k n−1 − k n−2 )(1 − β)2 . k n−1
For 2 ≤ l ≤ n − 1, we have ⎧ ⎨1 w (P(0,a1 ,...,an−1 ) ) if a1 = · · · = al−1 = 1 and al = 1 (l) w (P(0,a1 ,...,an−1 ) ) = α ⎩w (P otherwise. (0,a1 ,...,an−1 ) )
Fluctuations in Evaluations with Multi-branch Tree Method . . .
633
It follows that
y(l) (α) =
=
2 n−1 k w(P(∗i ) ) − w (l) (P(∗i ) ) i=1 w(P(∗ ) ) i
k n−1
2 n−1 k w (P(∗i ) ) − βw (l) (P(∗i ) ) i=1 w (P(∗ ) ) i
k n−1
(k n−l − k n−1−l )(1 − β )2 + (k n−1 − k n−l + k n−1−l )(1 − β)2 α = . k n−1
4 Fluctuation Function of T (4, 3) A fluctuation function for an arbitrary multi-branch tree T (n, k) was given in the previous section. Here, we give a numeric example of fluctuation functions with n = 4 and k = 3, where the number of middle levels is just one. This is well assumed in an actual situation of resource allocation. Here, we rename the three functions y(1) (α), y(2) (α), and y(3) (α) to yt (α), ym (α), and yb (α), respectively. From Theorem 1, we have ⎧ ⎪ 9(1 − αβt )2 + 18(1 − βt )2 ⎪ ⎪ yt (α) = , ⎪ 27 ⎪ ⎪ ⎪ S ⎪ ⎪ ⎪ where βt = 9 27 ⎪ ⎪ ⎪ α w (P(∗i ) ) + w (P(∗i ) ) ⎪ ⎪ i=1 i=10 ⎪ ⎪ ⎪ βm 2 ⎪ ⎪ ) + 21(1 − βm )2 6(1 − ⎪ ⎪ ⎪ α ⎪ , ⎨ ym (α) = 27 S where βm = 3 ⎪ 27 9 ⎪ 1 ⎪ ⎪ w (P(∗i ) ) + w (P(∗i ) ) + w (P(∗i ) ) ⎪ ⎪ i=1 i=4 i=10 ⎪ α ⎪ ⎪ ⎪ βb ⎪ ⎪ 2(1 − )2 + 25(1 − βb )2 ⎪ ⎪ α ⎪ , ⎪ ⎪ yb (α) = 27 ⎪ ⎪ S ⎪ ⎪ where βb = ⎪ . ⎪ 27 3 1 ⎪ ⎩ w (P1 ) + w (P(∗i ) ) + w (P(∗i ) ) i=2 i=4 α
634
N. Oyamaguchi
Proposition 1 The relationship between α, βt , βm , and βb is: ⎧ 1 ⎪ ⎨0 < < βt < 1 < {βm , βb } < α α 1 ⎪ ⎩0 < α < {βm , βb } < 1 < βt < α Proof Let S9 = S = S9 + S10 .
9 i=1
w (P(∗i ) )
(1) Suppose α > 1. We have 1= and
and
S10 =
if α > 1 if 0 < α < 1. 27 i=10
w (P(∗i ) ).
Note
that
1 < βt < 1 < {βm , βb } < α since α
1 S S S > > = βt > S9 + S10 αS9 + S10 α(S9 + S10 ) α
S S9 + S10 ⎧ S ⎪ ⎪ = βm ⎪ 9 ⎪ 3 1 ⎪ ⎨ w (Pi ) + w (Pi ) + S10 i=1 i=4 α < S ⎪ ⎪ = βb ⎪ 9 3 ⎪ 1 ⎪ ⎩ w (P1 ) + w (Pi ) + w (Pi ) + S10 i=2 i=4 α S = α. < 1 (S9 + S10 ) α
1=
(2) Suppose 0 < α < 1. We have α < {βm , βb } < 1 < βt < 1= and
1 since α
1 S S S < < = βt < S9 + S10 αS9 + S10 α(S9 + S10 ) α
S S9 + S10 ⎧ S ⎪ ⎪ = βm ⎪ 3 ⎪ 1 9 ⎪ ⎨ w (Pi ) + w (Pi ) + S10 i=1 i=4 α > S ⎪ ⎪ = βb ⎪ 3 9 ⎪ ⎪ ⎩ w (P1 ) + 1 w (Pi ) + w (Pi ) + S10 i=2 i=4 α S = α. > 1 (S9 + S10 ) α
1=
Fluctuations in Evaluations with Multi-branch Tree Method . . .
635
Here, let us denote the upper bound fluctuation functions of yt (α), ym (α), and yb (α) by Yt (α), Ym (α), and Yb (α), respectively. Theorem 2 The upper bound fluctuation functions Yt (α), Ym (α), and Yb (α) are obtained as follows. ⎧ ⎪ (1 − α)2 + 2 1 − 1 2 ⎪ ⎪ ⎪ ⎪ α ⎪ ⎪ Yt (α) = ⎪ ⎪ 3 ⎪ ⎪ ⎪ ⎨ 2 1 − 1 2 + 7(1 − α)2 α ⎪ Ym (α) = ⎪ ⎪ 9 ⎪ ⎪ ⎪ 2 ⎪ 1 ⎪ ⎪ + 25(1 − α)2 ⎪ 2 1 − ⎪ ⎪ α ⎩Yb (α) = 27 Proof For Yt (α), we have 0 < |1 − βt | < |1 −
1 | and 0 < |1 − αβt | < |1 − α| α
since ⎧ 1 ⎪ ⎨0 < < βt < 1 < αβt < α α 1 ⎪ ⎩0 < α < αβt < 1 < βt < α
if α > 1 if 0 < α < 1.
It follows that
9(1 − βt α)2 + 18(1 − βt )2 27 (1 − α)2 + 2 1 − 1 2 α . < 3
yt (α) =
For Ym (α), we have 0 < |1 − βm | < |1 − α| and 0 < |1 −
βm 1 | < |1 − | α α
since ⎧ 1 β ⎪ ⎨1 − α < 1 − βm < 0 and 0 < 1 − m < 1 − < 1 α α β 1 ⎪ ⎩0 < 1 − βm < 1 − α < 1 and 1 − < 1 − m < 0 α α
if α > 1 if 0 < α < 1.
636
N. Oyamaguchi
It follows that 6 1 − βm 2 + 21(1 − β )2 m α ym (α) = 27 2 1 − 1 2 + 7(1 − α)2 α < . 9 For Yb (α), the same proof works for βb , and it follows that 2 1 − βb 2 + 25(1 − β )2 b α yb (α) = 27 2 2 1 − 1 + 25(1 − α)2 α . < 27 Graphs of upper bound fluctuation functions Yt (α), Ym (α), and Yb (α) are shown in Fig. 4.
Fig. 4 Graphs of Yt (α), Ym (α), and Yb (α)
Fluctuations in Evaluations with Multi-branch Tree Method . . .
637
5 Discussion Let us consider the difference between levels in which evaluators with fluctuating values are set. Recall that it means upward fluctuation if α > 1 and downward fluctuation if 0 < α < 1. Here, we note that upward fluctuation means overestimation and downward fluctuation means underestimation. By observing the upper bound fluctuation functions, the following results were obtained. • For an upward fluctuation, the higher the level an evaluator with a fluctuating value is set, the less the influence on the final allocation ratio. • For a downward fluctuation, the lower the level an evaluator with a fluctuating value is set, the less the influence on the final allocation ratio. It must be noted that an evaluator cannot evaluate projects accurately so that the real values are reflected. Thus, evaluations always require careful consideration. For further studies, to reduce the influence on the final allocation ratio, it may be useful for each evaluator to choose more than one representative project. We should also consider which mean is suitable to adopt, arithmetical mean or geometric mean, to integrate all evaluation values. In addition, we should adopt the analytic hierarchy process [6], which is a traditional method for quantifying qualitative data by using individual preferences, for each evaluation.
References 1. Dietl, H.: Capital Markets and Corporate Governance in Japan, Germany and the United States: Organizational Response to Market Inefficiencies. Routledge, London, UK (1997) 2. Hurwicz, L.: The design of mechanisms for resource allocation. Am. Econ. Rev. 63(2), 1–30 (1973) 3. Kogan, L., Papanikolaou, D., Seru, A., Stoffman, N.: Technological innovation, resource allocation, and growth. Q. J. Econ. 132(2), 665 (2017) 4. Oyamaguchi, N., Tajima, H., Tournament, Okada I.: Using, method, a tree structure to resolve budget conflicts. In: Intelligent Decision Technologies 2019, Smart Innovation, Systems and Technologies, vol. 193, pp. 525–532. Springer, Singapore (2020) 5. Oyamaguchi, N., Tajima, H., Okada, I.: Model of multi-branch trees for efficient resource allocation. Algorithms 13, 55 (2020) 6. Saaty, T.L.: The Analytic Hierarchy Process. McGraw-Hill, New York, NY, USA (1980)
Foundations for Model Building of Intelligent Pricing Methodology Marina Kholod, Yury Lyandau, Valery Maslennikov, Irina Kalinina, and Ekaterina Borovik
Abstract The article presents a study of pricing at the point of sale of the major mobile provider mobile salon. Approach which is used in the study sets necessary foundations for intelligent pricing model development. This methodological paper presented the approach for development of the definition of the price, value, pricing and pricing methodology. An approach for the development of the backgrounds of pricing also included the development of the pricing mechanism structure, approaches and policies to pricing. Economic background of pricing lies in the classification of prices methodology proposed in this paper. This paper represents an important theoretical and methodological work prior to the work at the intelligent pricing model development. Keywords Price · Value · Pricing · Pricing methodology · Pricing structure · Pricing classification · Intelligent pricing
M. Kholod (B) · V. Maslennikov · I. Kalinina Chair of Management Theory and Business Technologies, Plekhanov Russian University of Economics, Moscow, Russia e-mail: [email protected] V. Maslennikov e-mail: [email protected] I. Kalinina e-mail: [email protected] Y. Lyandau Chair of Innovation Management and Social Entrepreneurship, Plekhanov Russian University of Economics, Moscow, Russia e-mail: [email protected] E. Borovik Chair of State Legal and Criminal Law Disciplines, Plekhanov Russian University of Economics, Moscow, Russia e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_53
639
640
M. Kholod et al.
1 Introduction The production of goods, the provision of services and the performance of work are carried out in order to satisfy existing demand or to form new demand in order to expand sales markets. Different market niches around the world are divided between key market players, and entering the market with similar products is quite problematic. In addition, at some point, saturation of the existing demand sets in, and then a new technology, a new product, a new service that will “revive the market”, are required. Such innovations create demand by explaining to consumers the benefits of consuming new products. A vivid example is the emergence in 2007 of Apple’s new iPhone smartphone. Initially, many smartphone users and competitors of the company thought that touch screens would not replace a convenient button keyboard. However, this vision was disastrous for Nokia, which was unable to maintain a leading position in the smartphone market. As a result, now we can observe that the whole world has switched to touch smartphones. Products, works and services on the market are provided to consumers for a fee. The obvious question is how to determine the amount of payment, so that on the one hand it covers the costs of the one who offers goods, work, services, and on the other hand suits the one who is ready to give money in exchange for the offered goods, work and services. The amount of such payment is the price of the goods, work and services. And the process of price formation is called pricing.
2 Price and Value in Management Accordingly, the price is the amount of money in exchange for which the seller of goods, works or services is ready to provide them to the buyer. Pricing—the process of determining and forecasting prices for goods, work or services. Pricing is present in almost any sphere of human activity: We buy food, clothing, medicine, consume medical services, pay for education, etc. Any product, work or service has a price, and it is formed using certain principles, methods and tools [1]. The set of pricing principles for goods, work, services, methods of calculating and forecasting prices, as well as the pricing policy of the organization is a pricing methodology. The pricing methodology is the same at each level of pricing of goods, work and services, and its basic principles do not change depending on the term for pricing or the participant in pricing. The most important principles of pricing methodology are as follows: (1) (2)
the scientific validity of prices (prices are determined and forecasted using certain pricing methods that are part of the pricing methodology); target price orientation (prices are determined and forecasted based on the goals set by the pricing participant. For example, to attract more customers, the price
Foundations for Model Building of Intelligent Pricing Methodology
(3)
(4)
641
of the product can be reduced. When calculating the initial maximum price of a contract associated with the purchase of certain goods, it is important to avoid inefficient spending. When a new unique product is brought to market, it is important for a commercial organization to maximize profits initially until similar offers appear on the market. In this case, the price of a new product can be significantly overstated); the continuity of the pricing process (in the world, various goods are constantly produced and sold, and for them the price is determined and, subsequently, adjusted); the unity of the pricing process and monitoring compliance with prices (an audit of prices and the pricing process is required to correctly calculate and determine prices in both government contracts and market contracts).
Price is a reflection of values formed by goods, works and services that can satisfy customer needs. Value—the ability of a product, work or service to satisfy a need or provide a customer with benefit. Value may include the following elements: • • • • •
result of goods, work and services; quality of goods, work and services; additional properties and functions of goods, works and services; price of goods, work and services; reputation of a brand or company—producer and seller of goods, works and services.
Price and value are actually two sides of the same coin. It is important for the client to understand what he will receive, giving money in exchange. Based on perceived value, the price of goods, works and services is formed.
3 The Methodology of Structuring of the Pricing Mechanism 3.1 Managing Pricing Mechanism The system of interconnected elements that ensure the effective formation of prices for goods, works and services is a pricing mechanism. The pricing mechanism includes the object, subject and subject of pricing. The object of the pricing mechanism is the price set for a product, work or service. The subjects of the pricing mechanism are individuals or legal entities involved in the pricing and sale of goods, works or services at a set price. The subject of the pricing mechanism is the relationship that arises between the seller and the buyer as a result of the formation of prices for goods, work or services. The relationship of the object, subject and subject of the pricing mechanism is presented in Fig. 1.
642
M. Kholod et al.
Fig. 1 Relationship of the object, subject and subject of the pricing mechanism
The seller offers the buyer goods at a certain price. The buyer may be individuals or legal entities (commercial, budgetary, autonomous and other organizations). The buyer is ready to purchase the goods if the necessary values are formed for him. Certain factors influence the price [2]. For example, competitors’ offers, substitutes, etc., government agencies are also involved in the pricing mechanism. On the one hand, government organizations make purchases for state and municipal needs. On the other hand, the state can directly or indirectly regulate prices, fix prices or provide certain preferences to sellers aimed at lowering prices (e.g., tax breaks or subsidies). State regulation of prices is as follows. The state can set fixed prices, introduce state list prices, for example, for electricity, railway tariffs, housing and communal services, travel in public transport, freeze prices and fix prices of monopolistic enterprises. Government agencies can determine the rules by which enterprises themselves set government-regulated prices. For example, setting a price limit for certain types of goods; regulation of the main price parameters, such as profit, discounts and indirect taxes; determination of the maximum level of a one-time price increase for specific goods. The state may establish, i.e., introduce a number of prohibitions on unfair competition and monopolization of the market (a ban on horizontal and vertical price fixing; a ban on dumping, a ban on price collusion). An example of such regulation is the state’s requirements for mobile operators to cancel roaming within the Russian Federation. The structure of the pricing mechanism consists of the following elements (Fig. 2):
Foundations for Model Building of Intelligent Pricing Methodology
643
Fig. 2 Structure of pricing mechanism
• pricing approaches; • pricing policy. Approach to pricing—a set of pricing methods, combined on the basis of general principles for determining and forecasting prices for goods, works and services. The pricing method is a way of determining and pricing based on the techniques, technologies and tools laid down in the pricing methodology. In the pricing methodology, the following groups of methods are distinguished: • • • •
market (pricing taking into account the impact of market factors); costly (calculation of price elements by cost items); parametric (pricing based on a parametric series); administrative (setting standards and tariffs at the state level) [3].
Pricing policy—the basic principles that guide the subject of pricing when setting prices for goods, works and services. Pricing policy includes pricing strategy and tactics. Strategy is understood as a combination of goals, indicators and an action plan aimed at achieving set goals. Pricing strategy—planning and forecasting prices for goods, work, services in the long term, taking into account the impact of macroeconomic and microeconomic factors. The development of a pricing strategy is necessary to predict cash flows from the sale of goods, works, services, as well as the calculation of planned financial indicators of the company. Pricing tactics—specific actions to determine and forecast prices for goods, works and services as part of the implementation of a pricing strategy. Such actions may be associated with a change in the price of the goods through the use of discounts, allowances, concessions, as well as due to the psycho-emotional
644
M. Kholod et al.
impact on the buyer: Using the method of unrounded numbers (999 rubles is perceived as 900 with something rubles, not 1000 rubles, which is significantly more pleasant for the buyer), the creation of a shortage (goods are running out, and only a few units are left in the warehouse), the conviction to purchase goods that have unique characteristics necessary for the buyer. Pricing is a process that is built into the process of creating a product. Any production of goods involves the implementation of certain actions that form the value chain, as they create value for the consumer. Such actions include identifying or shaping the needs of customers, designing a product, purchasing resources necessary for the production of a product, storing resources, direct manufacturing of products, quality control of manufactured products, storage of finished products and their delivery to customers. At each stage of production, certain costs are incurred (marketing costs, costs for the purchase of resources, etc.). The totality of such costs forms the cost of production. Next to the cost is added the profit necessary for the normal development of the company, as well as indirect taxes that the company must pay in accordance with the law. Thus, value is created through the implementation of the value chain, and the price is determined based on the financial parameters of the implementation of this chain (costs, profits, taxes). This confirms the above fact that price is a reflection of value, and this is the meaning of the economic foundations of pricing (Fig. 3). Cost—the monetary value of the cost of manufacturing or acquiring property. Market value—the value established in the market, which is approved by all parties to the transaction. The value creation process (value chain) is closely connected with the concept of value—a sequence of logically interconnected stages of creating a product, work or service, as a result of which certain costs are generated, the rate of profit, taxes and duties are laid. Thus, the cost is a fraction of the price of goods, works and services. The question arises: How is the value chain different from the value chain? These two concepts describe the same sequence of actions for the creation and sale of
Fig. 3 Economic backgrounds of pricing
Foundations for Model Building of Intelligent Pricing Methodology
645
goods, works or services. However, from the position of a buyer of the product, the result of this process is the value of the product; therefore, from his point of view, such a process will be called a value chain. From the perspective of the producer of a solid waste unit, this process is considered as adding value at each stage of creating a solid waste unit; therefore, from its point of view, this process will be called the value chain. There are several concepts for determining the value of goods. In the theory of labor value, value is considered as socially necessary labor costs without taking into account supply and demand, which inherently does not reflect the nature of prices. The theory of supply and demand considers the value of the goods in monetary terms. The theory of marginal utility suggests that the value of a commodity depends on its marginal utility. If the quantity of a product is less than the need for it, then its value increases. Before purchasing a product, the buyer is faced with the need to make an alternative decision which product to choose from the many offered on the market. Buying one of them, he refuses the others. The value of the product purchased represents the opportunity cost.
3.2 Prices Classification Prices for goods, works and services depend on various factors (economic, temporary, informational), which allows us to present different options for their classification. At the stages of adding value, prices are classified into wholesale, vacation and retail. Wholesale price—the price at which manufacturers’ products are sold without indirect taxes. Selling price—wholesale price with indirect taxes. Retail price—the price at which the product is sold to the final consumer. The wholesale price includes the cost of goods, works, services and profits laid down by the manufacturer. If excise taxes are added to cost and profit, then such a price can be considered the selling price without value-added tax (VAT). The price, including cost, profit, excise taxes, VAT, is the selling price of the enterprise and at the same time the purchase price of the wholesale intermediary, since it purchases the goods at that price. If the goods are not subject to indirect taxes, then the wholesale price of the enterprise will coincide with the selling price and the price structure will be simplified [4]. The wholesale intermediary forms a supply and sales allowance and sells the goods to the trading organization. This price is considered the selling price of the wholesale intermediary and at the same time the purchase price of the trading organization. If there are several wholesale intermediaries, there will be an appropriate number of elements of the same type: • purchase price of a wholesale intermediary; • selling price of a wholesale intermediary.
646
M. Kholod et al.
Fig. 4 Main elements of price
As a result, the share of the supply and sales allowance in the composition of the price will increase, and the structure of the price of goods will become more complicated. A trade organization makes a trade allowance, which will cover its costs and provide profit. The purchase price of a trade organization with a trade margin will be considered the retail price at which the product is purchased by the end user (Fig. 4). Depending on the regulation, prices are divided into free and regulated. Free price—the price set for goods, works and services, depending on supply and demand in the market. The free price can be represented as the price of demand or the price of supply. Demand price—the price that is formed on the buyer’s market. Bid price is the price set by the seller. Regulated price—the price of goods, works and services, established by government bodies. Regulated prices can be marginal and fixed. Marginal price—the value above which the price cannot be set by the subject of pricing. A fixed price is a price that cannot be changed without a decision of a state body. Depending on the time factor, prices are classified as constant, step and seasonal. Constant price—the price of goods, works, services, the validity of which is not initially set. A stepped price is a sequentially changing price of goods, works, services, taking into account a given scale. Seasonal price—the price of goods, work, services, changing as a result of the influence of the seasonal factor. Depending on the method of informing, prices are divided into settlement, reference and price lists. Estimated price—the price calculated individually for each specific product, work and service. Reference price—the price published in certain price directories, catalogs, etc. List price—the price of goods, works and services, published in the price lists of manufacturing companies or sellers. From the point of view of calculating and forecasting prices for goods, works and services based on various financial and economic parameters, the classification of prices according to the stages of adding value is considered.
Foundations for Model Building of Intelligent Pricing Methodology
647
4 Conclusion This methodological paper presented the approach for the development of the definition of the price, value, pricing and pricing methodology. An approach for the development of the backgrounds of pricing also included the development of the pricing mechanism structures, approaches and policies to pricing. Economic background of pricing lies in the classification of prices methodology proposed in this paper. This paper represents an important theoretical and methodological work prior to the work at the intelligent pricing model development.
References 1. Lyandau, Y. et al. Business Architect: Business Management Systems Modeling. Guidebook on Process Cost Engineering Technologies. RuScience (2016) 2. Lyandau, Y.: The development of the process-base approach to management. Econ. Stat. Informatics 65–68 (2013) 3. Kholod, M., et al.: Spatial customer orientation in the sales process within the retail environment. In: SGEM2016 Conference Proceedings, vol. 1, pp. 389–396 (2016) 4. Sutherland, J., Altman, I.: Organizational transformation with Scrum: How a venture capital group gets twice as much done with half the work. In: 43rd Hawaii International Conference on Software Systems (2010)
Informatization of Life Insurance Companies and Organizational Decision Making (Case of Nippon Life Insurance Company) Shunei Norikumo
Abstract This research focuses on Nippon Life, which has been actively computerized since the first half of 1900 and has achieved dramatic growth, and explores the construction of information systems in organizations. Specifically, we will explore the history of informatization of Nippon Life, the introduction of equipment, business management, organizational expansion, and organizational change methods, and about organizational informatization and organizational changes so that they can be hints for organizational change in modern companies. To clarify. In the current management, the utilization of information technology is indispensable to maintain the organization smoothly, but it is necessary to continue to use the same system and to accurately grasp the external environment and management problems and introduce the technology. Otherwise, it will not lead to good organizational maintenance. Against this background, I would like to explore the rules for corporate organizations to adapt to the future artificial intelligence society. Keywords Management information system · Organizational change · Artificial intelligence · Life insurance · Business management · Decision making
1 Introduction In recent years, expectations for artificial intelligence technology have increased, and especially in this third boom, the accuracy of inference systems, sensor technology, image analysis, etc. has attracted attention and is playing an active role in various situations. Many companies are exploring the use of artificial intelligence technology, and trial introduction of the technology is progressing, but it is still far from the arrival of the artificial intelligence society that is regarded as utopia [1]. The utilization of information technology in recent years is also expected to be a catalyst for organizational reform. In that case, work efficiency will be improved in combination with business process reengineering (BPR). It is difficult to design S. Norikumo (B) Osaka University of Commerce, Osaka 577-8505, Japan e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_54
649
650
S. Norikumo
an organization because the organization itself is not a material thing but a socially organic entity. In addition, it is difficult to comprehensively coordinate the organization, such as introducing technology, substituting human work, and changing the organization [2]. Therefore, in this research, we will explore the organizational changes that companies have made efforts to promote computerization.
2 Origin of Nippon Life and Informatization Nippon Life was founded by Hirose Sukesaburo, a banker in Hikone City, Shiga Prefecture. Having been involved in the local mutual aid organization for a long time, on July 4, 1889 (Meiji 22), the governor of Osaka Prefecture accepted the request for the establishment of “Limited Liability Nippon Life Insurance Company”, and Nippon Life began. Nippon Life introduced the English-made Tate’s calculator for actuarial science and subscriber calculations as early as 1897 and has achieved results in improving administrative efficiency. It is said that the introduction was brought back from the UK when Vice President Naoharu Kataoka returned from an overseas inspection. After that, aggressive replacement of computers progressed, and in 1916, the Millionaire multiplication and division computer made by Rui Saigoku was introduced and used to calculate policy reserves. In addition, Marsedes, Rheinmetall, Monroe, Merchant, Alice Moss, Sand Strand, etc. have been introduced. In 1925, Mr. Tsunenao Morita, who was the director of the accounting bureau, ordered a Powers statistical machine from Mitsui & Co., Ltd. after visiting Europe and the USA and introduced it. From the following year, it will be used in the main budget, foreign affairs, and medical department [3, 4].
2.1 Development of Nippon Life and Computerization Later, computers began to be widely used for paperwork from the Powers-type statistical machine in 1925, which was initially used for mathematical statistical work, but later, management statistics, recruitment costs, continuation fee calculation, and so on. It is also used in the practical aspects of the main budget, foreign affairs, and medical departments to dramatically improve efficiency. In June 1926, “Shayu” describes a policy of devoting human effort to management research and analysis from clerical work. Then, in 1934, the Hollerith statistic machine IBM405 was introduced (converted from the Powers type to the Hollerith type). In the same year, the Burroughs statistical machine was introduced, and the creation of insurance premium payment guides was started [5, 6]. The main uses of insurance companies are the calculation of policy reserves and policyholder dividend reserves, the preparation of statistical tables stipulated in the
Informatization of Life Insurance Companies and Organizational …
651
Insurance Business Law, and the calculation of profit dividends to policyholders. Preparation of reference statistics table based on our experience, preparation of statistics on foreign affairs efficiency and other performance, preparation of statistics for medical affairs, insurance premium payment guidance, simultaneous creation of insurance receipt and insurance premium income statement, employee salary and receipt. It was the creation of special actuarial research and a large amount of other numerical research. The organization reformed its position in 1919, when Naoharu Kataoka, who had been in business for about 30 years since its founding, announced that he would enter the political world and established an accounting section, a finance section, an accounting section, and an accounting section. At the same time as Mr. Suketaro Hirose became the third president in 1929, the 14 divisions were expanded to 26 divisions, and the accounting bureau, finance bureau, accounting bureau, etc. were assigned. In 1939, the accounting section was expanded to seven, and in 1944, the statistics section was established. In 1954, he launched the “Three-Year Plan for Return to Prewar” and promoted the improvement of management efficiency. At the board meeting on October 11, 1955, “(2) rationalization of office work, reduction of maintenance costs, and management materials. It is said that it will study the introduction of large-scale computers for maintenance. In 1959, the IBM 7070 computer was introduced, and in 1957, the establishment of a foreign affairs education system, manager education from 1951, and internal staff education from 1953 is being expanded. The five-year management rationalization plan formulated in 1956 stated “(2) Expansion of education for both clerical staff and sales staff” and “(3) Efficiency of internal affairs on the premise of introducing a large computer”. In addition, (3)-➁ immediately proceed with preparations for a system for accepting large-scale computers and make efforts to ensure normal operation within the planned period. It can be seen that the entire organization is strongly promoting computerization. According to the company’s history, “Since 1929, clerical work has become accurate and quick, and as a result, we have succeeded in gradually replacing human power with machines.” In addition, a magazine on the history of IBM Japan 50 years states that “Nippon Life not only developed as Japan’s largest life insurance company from the Taisho era to the first year of the Showa era, but was also a pioneer in the mechanization of office work.” Has been done. Prompt and accurate work of arranging income for insurance premiums and storing work is the basis of the work of life insurance companies, and when reconstruction started after the war, the mechanization of work was started immediately. After concentrating the original slips that had been evacuated to the local areas again at the head office and collating them one by one with the punch cards of the accounting department, in 1948, the insurance original slips were converted into punch cards for the first time in the industry, and the insurance premium receipt issuance work was done by IBM accounting machines. Was mechanized. Since IBM accounting machines can only print numbers, we devised a “Nissey-style address transfer machine” (address machine) that applied the principle of electronic photography, and made a prototype at Toshiba, but there are many practical difficulties,
652
S. Norikumo
and printing of addresses is difficult. I didn’t see it happen. Machine printing of addresses was realized after the introduction of the IBM 407 accounting machine in 1958, and at that time, using katakana characters, we succeeded in creating a machine for insurance premium receipts in a single process up to address recording. Machine making of insurance policies, and mechanization of calculation and statistical work in each department will be carried out one after another after 1950, but in 1952, the machine making of insurance policies was successful for the first time in the world. This was an epoch-making method of printing the number part of securities with an IBM accounting machine at a speed of 25–30 sheets per minute [7, 8]. In 1962, the company succeeded in automatically typing the address and name of an insurance policy in katakana using a flexo writer. A flexo writer is a machine that prints characters on paper and at the same time punches the printed contents into paper tape. Using the flexo writer, the securities number, address, policyholder name, and insured are printed on the address sheet for sending insurance policies. When the person’s name, recipient’s name, etc. were type-printed, the contents were punched in the paper table at the same time. Next, this paper tape is used to machine print the necessary items on the insurance policy and the original slip. By using the flexo writer, the conventional handwriting work or the writing work by Japanese type became unnecessary, and the work was greatly reduced. However, due to misreading and mistyping of kana characters, there was a problem that the address was often mis-sent by mail. In 1965, in order to solve these problems, Xerox decided to create insurance policies. In addition, we decided to use Xerox to create original slips, copies of notifications, and survival survey materials, which greatly streamlined office work. In 1967, with the development of computers, machine operation technology became more sophisticated, and specialized personnel for machine operation became necessary, and Nissay Computer Service (NCS) was established. It takes over PCS business and starts contracting computer business. Although it was limited to computer machine operation work, after that, the contract work was expanded to drilling work, post-processing work (receipt creation, carbon removal cutting, encapsulation, packing), etc., and from 1976, the entire data entry work We are entrusted. With the expansion of business, in 1972, a system of four groups and three shifts was implemented, and it operates 24 h a day, 7 days a week. The company also undertakes contract work for general companies, medical institutions, and public organizations, and the number of employees has increased tenfold from the initial 40 employees.
2.2 Rationalization of Long-Term Office Work With the introduction of large computers starting with the IBM 7070, the basic policy of the office system was established. By switching from the conventional hand processing or hatch ground processing to computer processing, office work efficiency has improved dramatically. As a result, as a new management problem, the number of in-force contracts has increased, improving contractor services and promoting labor
Informatization of Life Insurance Companies and Organizational …
653
saving in office work. In 1963, the office management and office management section were set up to start researching and organizing office work issues, and in 1964, the “long-term office work system establishment special committee” was established to start studying future office work systems. In 1967, the Computer Department was renamed the Office Management Department, and the Office Management Division was moved to the Office Management Department to develop an office processing system linked to the computer system. The administrative department and related departments jointly started to formulate a concrete plan, and the direction of a stepwise rationalization plan based on the following three points was established. (1) (2) (3)
Adoption of large-capacity arbitrary extraction storage device (random access memory). Adoption of turn-around document method centered on optical character reader (OCR). Use of terminals via communication line (online).
First rationalization: In 1968, the first rationalization expert committee was established, and in 1970 the first rationalization was put into practice. Through the first rationalization, (1) integration of hand files into machine files (annual and half payment), (2) adoption of OCR building (annual and half payment), and (3) online in-store have been realized. In the first rationalization, the hand files distributed in this way were stored in a large-capacity arbitrary extraction device to improve the efficiency of paperwork, and the transition to machine files was completed in 1970. Second rationalization: The first rationalization integrated the hand file into the machine data file, but it was only paid for a year and a half, and online remained in the main store. Therefore, it was decided to continue to implement the second rationalization focusing on the following three points. (1) (2) (3)
Migration of monthly payment original slip items to machine files. OCR of monthly payment building. Nationwide online implementation.
Third rationalization: Third rationalization is the final stage of the long-term office work rationalization plan, by managing customer data such as contract payer name, contractor address, insured name, date of birth, etc. in a machine file. Following the next rationalization and the second rationalization, we aimed to promote the mechanization of hand office work. At the same time, with the aim of establishing a system that responds to market conditions and customer trends based on such customer data, a concrete study began in 1973. (1) (2) (3)
Unit code machine management. Mechanization of insured data indexing business. Name identification and customer information system.
Opening of the complete book microsystem. The company stores all insurance contract applications, ancillary documents, requirements change request documents,
654
S. Norikumo
etc. since its establishment in 1947, which is called “full book”, but in 1977, the full book micro online, The system is completed. The “System 100” project was launched in 1984 to build a comprehensive information online system and was introduced in 1988. This system improves (1) commercialization of comprehensive welfare and comprehensive financial services, (2) efficiency improvement by strengthening cost competitiveness, and (3) customer responsiveness. It has become possible to manage personal information (family structure, medical history, age, etc.), which is difficult elsewhere, and to manage name identification by integrating customer-based transactions. Utilization of this information was the beginning of providing high value-added services to customers. Regarding the introduction of computerized equipment, the equipment of the headquarters, branch offices, and branches was 1700 small computers, 7000 terminal equipment, and 200 facsimile terminals. It was connected to farm banking, a joint center of life insurance and banks through a PDX line exchange network. Mainly engaged in replacement business for other companies such as corporate pensions, group term life insurance premiums, insurance claims, dividends, etc., and connected to other industry networks such as client companies and banks [9]. Nissay Information Technology Co., Ltd. was founded in 1999. The Year 2000 Problem Countermeasures Committee was established jointly with Nissey Computer to respond. We carry out thorough checks of all systems, mock training, production environment tests, and construction of production monitoring systems. In 2000, the corporate insurance introduced WEB system (inquiry about subscription status, calculation of expected future pension amount) “Nissay Life Navi” and “Securities Management System (Nissay IT-X NET)"; system integration through the merger of Nissay Insurance and Dowa Fire & Marine Insurance in 2001; in 2002, Nissay Dowa’s “Complete Renewal of Core Systems” and Corporate Pension Business System (Joint Commercialization/Defined Benefit Corporate Pension Law); and introduced “Forum 21 Agency System” in 2003 and “Life Insurance Contract Management System (i-Win/LIFE)” in 2004. Since 2017, the medium-term management 4-year plan has been promoting digitalization with the strategy of “utilizing advanced IT” Furthermore, in 2019, it is formulating a digital five-year plan. In the in-house sales business, we use the application of the tablet terminal that applies AI technology to analyze the customer information, contract information, and big data of sales activities accumulated in the past, and set specific conditions so that we can visit at the optimal timing. We have introduced a mechanism to present applicable customers. The deployment of Robotic Process Automation (RPA) utilizing AWS is also progressing, and the company-wide deployment will start in 2018, and in 2019, a total of 360 robots are in operation for about 350 operations. These efforts are highly evaluated in the industry, and in the 2020, IT Award-winning corporate category (area corresponding to “new lifestyle”) of the Corporate Information Technology Association (commonly known as the IT Association), Nippon Life Nissay Information Technology Co., Ltd. has been commended for “Maximizing CX by integrating Face to Face and digital (introduction of smartphones for sales staff)”.
Informatization of Life Insurance Companies and Organizational …
655
3 Reconstruction of Office System The rationalization of clerical work from the first to the third has achieved great results in terms of both reduction of clerical staff and improvement of contractor service. However, during that time, changes in the socio-economic environment and heightened customer awareness have been remarkable, and the following new issues have arisen regarding the company’s office work. (1) Establishing a customeroriented system can provide more accurate and prompt office services. (2) Further improvement of office work productivity improves management efficiency. (3) Redevelopment of the system has become complicated with the expansion of mechanical systems. Therefore, we are reviewing the office system that was compiled up to the third rationalization from a new perspective and have begun to develop a new office system. In the new office system, the machine system is radically reconstructing the computer system so that it can flexibly respond to future changes by incorporating the latest techniques such as file structure and machine welfare method for program writers. Local completion system by expanding online business. One of the basic ideas for new office work and systems is to make full use of online technology that has dramatically developed since the latter half of the 1965. The new office system will further speed up the current system, implement a so-called online real-time system, and aim to complete the office processing locally.
4 Characteristics of Nippon Life’s Information System and Organizational Changes Nippon Life’s information systemization has the following characteristics. (1)
(2)
The top manager is taking the initiative in promoting the introduction of information technology. Especially in Japan, when imports and introductions are rare, the top management themselves grasp the characteristics of the products and have introduced them. In addition, the calculator was exhibited at the Japan Life Insurance Expo held at a department store, and the computerization model was generously disseminated, so there are signs that the organization will be changed by new technology. In addition, it is thought that it had a great impact not only on competitors. There are plans to improve management problems and introduce information systems accordingly. Accurately grasp management problems in business and customer service, plan to introduce information systems at an early stage, and plan to improve the problems. The core of the company’s business content and business process is to provide life insurance products to the world according to the times from statistics based on life tables, but in addition, improvement of business content is directly linked to customer service. There are many parts. Therefore, if you make an investment in information technology, the company
656
(3)
(4)
S. Norikumo
and customers can directly benefit from it. Even if you make an investment in information technology, the benefits of investing in information technology are simply shown, unlike other industries that receive benefits indirectly or invisible. Thorough dissemination within the company. Since the introduction of PCS, an in-house newsletter “Company Friend” has been created to report on the practicality of computers and the need for computerization, and the information has been thoroughly disseminated to employees. Research is being enthusiastically conducted in-house for computerization. At the time of PCS, he has an obsession to apply existing technologies such as printing katakana characters, which was considered difficult, to his own business. When introducing a large-scale system or the latest equipment, a preparatory committee and a promotion committee have been set up in advance, and the introduction has been made carefully.
5 Conclusion Based on the success of informatization in the early stage of life insurance and the success of organizational sustainability, this research focused on Nippon Life and explored the history and organizational transition of its informatization. Here is what I learned from the above. The use of computers is a major factor in the organizational success of Nippon Life. On a scale of computerization, one step was to use a statistician/calculator for an early life table. This not only develops life products that were in a profitable era, but also contributes to the smooth progress of later informatization. In two steps, we deal with business processing at the time of contract and monthly fee collection due to the rapid increase in contractors. By quickly processing data using a computer, we were able to respond to the increase in subscribers. The three steps are to network the national system. By demonstrating similar business processing not only in rural areas but also in rural areas, we were able to maximize the characteristics and performance of computers and grow the organization. If we take the lesson of Nippon Life’s success, we need to consider a huge amount of human work and innovative information technology as a set in future artificial intelligence society. In addition, computerization to increase customer satisfaction is likely to lead the organization to survival. This is a trend in the life insurance industry, and it should be noted that the content of introduction and organizational efforts does not necessarily apply to other industries. Acknowledgements This research is part of the research results that received the "Management Science Research" award from the Japan Business Association in 2019. I would like to express my gratitude for the support of the foundation.
Informatization of Life Insurance Companies and Organizational …
657
References 1. Norikumo, S.: Consideration on substitution by artificial intelligence and adaptation of organization structure: From the history of management information systems theory. Jpn. Soc. Artif. Intell. 32(2), J1–05 (2018) 2. Management Information Society: Total Development History of IT Systems of Tomorrow. Senshu University Publishing Bureau (2010) 3. Kishimoto, E.: Management and Technology Innovation. Nihon Keizai Shimbun (1959) 4. Beika, M.: The History of Japanese Management Mechanization, pp. 3–57. Japan Management Publications Association (1975) 5. Nippon Life Insurance Company: Nippon Life 90 Years History (1980) 6. Nippon Life Insurance Company: Nihon Life 100 Years History Material Edition (1992) 7. Japan Business History Research Institute: IBM 50 Years History. Japan IBM Corporation (1988) 8. Japan Business History Research Institute: History of Computer Development Focusing on IBM. Japan IBM Corporation (1988) 9. Watanabe, S.: Comparative analysis of computer adoption in the banking and life insurance industries. In: Human Sciences, Osaka Prefecture University Bulletin 4, pp. 107–133. Osaka Prefecture University (2009)
Value Measurement and Taxation Metrics in the Model-Building Foundations for Intelligent Pricing Methodology Marina Kholod, Yury Lyandau, Elena Popova, Aleksei Semenov, and Ksenia Sadykova Abstract The article presents a study of basic cost classification, pricing at the point of sale of the major mobile provider mobile salon. Approach which is used in the study sets necessary foundations for intelligent pricing model development. This methodological paper presented the approach for development of the metrics for value measurement in management, including the taxes, excises and customs duties as part of the pricing model definition. Four ways of tax inclusion into the pricing model are presented—basic model of VAT calculation, model without VAT calculation in the middle of value chain, model without VAT calculation at the end of value chain, and model without VAT and excise calculation in the middle and at the end of value chain. This paper represents an important theoretical and methodological work prior to the work at the intelligent pricing model development. Keywords Price · Value · Pricing · VAT calculation · Middle of the value chain · End of the value chain · Excise calculation · Intelligent pricing
1 Introduction Initially, the price includes cost and profit. Cost is the cost of production and sale of products, expressed in cash. M. Kholod (B) · E. Popova Management Theory and Business Technologies, Plekhanov Russian University of Economics, Moscow, Russia Y. Lyandau Innovation Management and Social Entrepreneurship, Plekhanov Russian University of Economics, Moscow, Russia A. Semenov Planning and Economic Department, Russian Metrological Institute for Technical Physics and Radio Engineering, Plekhanov Russian University of Economics, Moscow, Russia K. Sadykova Charity Fund for Support of Educational Program—Captains of Russia, Plekhanov Russian University of Economics, Moscow, Russia © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1_55
659
660
M. Kholod et al.
Costs are an assessment of the cost of material, labor, financial and other resources for the production and sale of goods, work, services for a certain period of time. Costs—a decrease in economic benefits caused by liabilities. Costs—the total expenses of the organization arising from the implementation of its activities. Thus, the fundamental difference between costs and expenses is that when determining costs, we evaluate the value of the resources with which these costs are associated, and when determining costs, we assess how much our economic benefits have decreased.
2 Basic Costs Classification Costs can be classified in different ways. But many classifications are based on two basic classifications (Fig. 1): • direct/indirect, • conditionally constant/conditionally variable. Direct costs—costs that are directly related to the manufacture of products, work and services. Direct costs may include: • • • •
costs of raw materials and basic materials, wages of production workers, the cost of fuel and energy for technological purposes, etc., depreciation of equipment.
Indirect costs—costs that are not directly related to a certain type of goods, work, services and are distributed over the entire range in proportion to the number of manufactured and sold product or direct costs. Indirect costs may include: • • • •
costs of raw materials and basic materials, rent for the rental of office space, salaries of administrative staff, interest on loans [1].
Conditionally variable costs—costs, the value of which depends on the volume of production and sale of goods, works, services. Conditionally variable costs may include: • costs of raw materials and basic materials,
Fig. 1 Basic classification of costs
Value Measurement and Taxation Metrics …
661
• piecework wages, • fuel and energy costs for technological purposes, etc. Conditionally fixed costs—costs, the value of which does not depend on the volume of production and sale of goods, works, services. Conditionally fixed costs may include: • depreciation, • rent, • interest on loans. There are generally accepted costing articles: • raw materials, • returnable waste (deductible), • purchased products, semi-finished products and production services of third-party enterprises and organizations, • fuel and energy for technological purposes, • wages of production workers, • deductions for social needs, • overhead costs, • general running costs, • losses from marriage, • other manufacturing costs, • selling expenses. Commercial organizations can use these articles to calculate the cost of production or develop their own costing options. The next element of price is profit. Profit—the difference in income received after the goods are sold at a set price and the cost of creating and selling the goods. The relative amount of profit is profitability. Profitability is an indicator of the economic efficiency of the use of resources.
3 Metrics for Value Measurement in Management For pricing, cost-effectiveness, return on sales, return on assets, and return on investment are primarily important. Cost-effectiveness—the ratio of profit received from the sale of goods, works, services to the costs of their production and sale [2]. Cost-effectiveness =
Profit Costs
(1)
Profitability of sales—the ratio of profit received from the sale of goods, works, services to revenue received from their sale. Return on sales calculated per unit of
662
M. Kholod et al.
production will be the ratio of profit received from the sale of a unit of goods, work, service to the price of a unit of goods, work, service. Profitability of Sales =
Profit Revenue
(2)
Return on assets—the ratio of profit received from the sale of goods, works, services to the value of the assets involved in the production and sale of goods, works, services. Return on Assets =
Profit Vaalue of the Assets Involved
(3)
Return on investment—the ratio of profit received from the sale of goods, works, services to investments necessary for the creation and sale of goods, works, services. Return on Investment =
Profit Investment
(4)
Cost and profit form the wholesale price of goods, work, services. Further, indirect taxes are added to this price.
4 Taxes, Excises and Customs Duties as Part of the Pricing Model 4.1 Basic Model of VAT Calculation According to the legislation of the Russian Federation, taxes are divided into direct and indirect. Direct taxes are taxes that are levied directly on income or property. Indirect taxes—taxes that are established in the form of a premium to the price of a product or service. The most common indirect tax in the Russian Federation is value-added tax. Valueadded tax (VAT) is an indirect tax that represents a part of the value of a product that is created at all stages of the production process of goods, work and services and is paid to the budget as it is sold [3]. When calculating the total amount of his obligations to the budget, the seller has the right to deduct from the amount of tax received from the buyer the amount of tax that he paid to his supplier for taxable goods, work or services [4]. Consider examples of calculating value-added tax in the price of goods (Fig. 2). The manufacturer acquires from the supplier the resources necessary for the production of goods. The cost of resources excluding VAT is 500 rubles, VAT—18%. The purchase price of resources with VAT = 500 * 118/100 = 590 rubles. The manufacturer incurs certain costs and pays profits. Thus, he adds to the price of
Value Measurement and Taxation Metrics …
663
Fig. 2 Model of VAT calculation
resources another 1000 rubles without VAT. Accordingly, you can calculate the selling price of the manufacturer with VAT 18%. Manufacturer price = (500 + 1000) * 118/100 = 1770 rubles. Calculating the manufacturer takes into account the cost of resources without VAT—500 rubles. Otherwise, double taxation will occur (VAT will be calculated from the VAT of resources).
4.2 Model Without VAT Calculation in the Middle of Value Chain A trade organization purchases goods from a manufacturer and makes a trade margin on the purchase price without VAT. Trade margin is 30%. The retail price of a trade organization without VAT = 1500 * 130/100 = 1950 rubles. A trade organization pays VAT on its trade margin. The retail price of the trade organization with VAT 18% = 1950 * 118/100 = 2301 rubles. At this price, the end customer purchases the goods with 18% VAT. Consider an example where a manufacturer in the middle of a chain is exempt from VAT (Fig. 3). This option is possible if a simplified taxation system (STS) is used.
664
M. Kholod et al.
Fig. 3 VAT calculation: tax regime without VAT in the middle of value chain
4.3 Model Without VAT Calculation at the End of Value Chain Simplified taxation system—a special tax regime implying a special tax payment procedure in order to support representatives of micro-, small- and medium-sized businesses. Organizations and individual entrepreneurs applying the simplified tax system are exempt from income tax, property tax and VAT. At the same time, as of 2019, they are paying 6% of income or 15% of the difference in income and expenses. The list of costs that can be attributed to expenses under the simplified tax system is determined by the Tax Code of the Russian Federation. There are certain restrictions on the use of the simplified tax system. As of 2019, the average number of employees of an organization applying a simplified tax system cannot exceed 100 people. The residual value of fixed assets cannot be more than 150 million rubles. Annual income cannot exceed 150 million rubles. So, the manufacturer buys resources from the supplier at a price of 500 rubles without VAT. The purchase price with 18% VAT will be 590 rubles. The manufacturer adds 1000 rubles of cost (costs plus profit) and calculates the price for the trading organization. The selling price of the manufacturer is 1590 rubles. This price will be without VAT, since the manufacturer does not pay indirect tax. A trade organization buys goods at a price of 1590 rubles and makes a trade margin of 30%. The retail price of a trade organization without VAT will be: 1590 * 130/100 = 2067 rubles. The trade organization pays 18% VAT, so the indirect tax must be added to the price without VAT. Retail price with VAT 18% = 2067 * 118/100 = 2439.06 rubles.
Value Measurement and Taxation Metrics …
665
Fig. 4 Model without VAT calculation at the end of value chain
The final price for the end consumer turned out to be higher compared to the first option, since the trade organization is forced to pay VAT on the entire cost of the goods due to a break in the logic of calculating VAT in the middle of the chain. Therefore, in practice, participants in the value chain will not work with a participant without VAT in the middle of the chain if all subsequent participants pay VAT. Consider the third option when a trade organization applies a simplified taxation system (Fig. 4). The manufacturer buys resources from the supplier at a price of 590 rubles with 18% VAT. The manufacturer adds 1000 rubles of cost (costs plus profit) and determines the price for the trading organization. Price with VAT 18% will be 1770 rubles. A trade organization buys goods at a price with VAT of 1770 rubles. The trade organization makes a trade margin of 30% of the purchase price without VAT. The trade margin can be calculated from the purchase price with VAT; however, this will increase the price. If the trade organization has enough trade margins on the price without VAT to cover costs and ensure profit, then it is possible to calculate the price without VAT.
4.4 Model Without VAT and Excise Calculation in the Middle and at the End of Value Chain The trade organization does not pay VAT. Therefore, the retail price for the end customer will be: 1500 * 130/100 + 270 = 2220 rubles. The retail price of the goods
666
M. Kholod et al.
turned out to be lower than in previous versions due to a reduction in the amount of VAT in the entire chain. Thus, the last participant in the chain can use tax regimes without VAT. This will allow him to either lower the price or get additional profit. The next indirect tax, which is added to the cost and profit in the price of the goods, is excise duty. Excise duty is an indirect tax established on consumer goods within the country and included in the price of goods. According to the Tax Code of the Russian Federation, excisable goods are recognized: • • • • • • • • •
alcohol-containing products, medicines, alcoholic beverages, tobacco products, automobile gasoline, diesel fuel, engine oils, straight-run gasoline, other.
The excise tax is calculated as follows (Fig. 5). The manufacturer acquires resources from the supplier at a price of 590 rubles with 18% VAT. The manufacturer makes the goods, which are subject to excise duty. The excise tax on manufactured goods is 300 rubles. The manufacturer adds 1000 rubles of value, which includes the costs and profits of the manufacturer. Then, the wholesale price of the manufacturer will be 1500 rubles, the selling price without
Fig. 5 Model without VAT and excise calculation in the middle and at the end of value chain
Value Measurement and Taxation Metrics …
667
VAT, but with an excise tax—1800 rubles, the selling price—2124 rubles. The selling price is calculated as follows: (500 + 1000 + 300) * 118/100 = 2124 rubles. At the manufacturer, VAT is added to its costs, profit and excise tax: (1000 + 300) * 118/100 = 1534 rubles. A trade organization buys goods at a price of 2124 rubles with a VAT of 18% and an excise tax of 300 rubles. The trade organization makes a trade margin of 30% on the price without VAT. Then, the retail price without VAT will be: 1800 * 130/100 = 2340 rubles. Retail price with VAT 18% = 2340 * 118/100 = 2761.2 rubles. This price includes excise tax paid by the manufacturer in the amount of 300 rubles. The trade organization does not pay excise taxes again. That is, excise duty is paid by the participant in the value chain who directly produces excisable goods.
5 Conclusion This methodological paper presented the approach for development of the definition of the price, value, pricing, pricing methodology. An approach for development of the backgrounds of pricing also included the development of the pricing mechanism structure, approaches and policies to pricing. Four ways of tax inclusion into the pricing model are presented—basic model of VAT calculation, model without VAT calculation in the middle of value chain, model without VAT calculation at the end of value chain, and model without VAT and excise calculation in the middle and at the end of value chain This paper represents an important theoretical and methodological work prior to the work at the intelligent pricing model development.
References 1. Lyandau, Y., et al.: Business Architect: Business Management Systems Modeling. Guidebook on Process Cost Engineering Technologies. RuScience (2016) 2. Lyandau, Y.: The Development of the process-base approach to management. Econ. Stat. Inform. 65–68 (2013) 3. Kholod, M., et al.: Spatial customer orientation in the sales process within the retail environment. In: SGEM2016 Conference Proceedings, vol. 1, pp. 389–396. 4. Sutherland, J., Altman, I.: Organizational transformation with scrum: how a venture capital group gets twice as much done with half the work. In: 43rd Hawaii International Conference on Software Systems (2010)
Author Index
A Acosta, Angel, 59 Alexandrov, Dmitry, 333 Andriyanov, Nikita, 253 Arima, Sumika, 109 Ashrafi, Negin, 141
G Golubev, Vasily, 231 Gorchakov, Yaroslav, 365 Grichnik, Anthony J., 209 Grigoriev, Evgeniy, 83 Gromoff, Alexander, 365
B Balonin, Yury, 95 Barbucha, Dariusz, 15 Borovik, Ekaterina, 639 Bouziat, Valentin, 531 Brenneis, Markus, 3 Buryachenko, Vladimir V., 241 Byerly, Adam, 209
H Hamad, Yousif, 173 Hara, Akira, 583 Harada, Toshihide, 595 Hashimoto, Shintaro, 469
C Chen, Vivien Yi-Chun, 285 Chevaldonné, Marc, 221 Czarnowski, Ireneusz, 25
D Dementiev, Vitaly, 253 Dhome, Michel, 221 Dumitru, Vlad Andrei, 519
F Fajdek-Bieda, Anna, 297 Fartushniy, Eduard, 343 Favorskaya, Alena V., 151, 231 Favorskaya, Margarita N., 185, 241 Fomina, Irina, 343
I Ichimura, Takumi, 559, 595 Ioana-Florina, Coita, 71 Ismoilov, Maqsudjon, 333 Itoh, Yoshimichi, 457
K Kalganova, Tatiana, 209 Kalinina, Irina, 639 Kamada, Shin, 559, 595 Kanai, Hideaki, 309 Kato, Takumi, 127 Kawano, Shuichi, 441, 491 Kents, Anzhelika, 173 Kholod, Marina, 639, 659 Kizielewicz, Bartlomiej, 275 Klimenko, Herman, 343 Kondratiev, Dmitry, 253 Korbaa, Ouajdi, 37 Kotsuka, Yuhei, 109
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2021 I. Czarnowski et al. (eds.), Intelligent Decision Technologies, Smart Innovation, Systems and Technologies 238, https://doi.org/10.1007/978-981-16-2765-1
669
670 Kozhemyachenko, Anton A., 151 Kozlov, Artem, 333 Krasheninnikov, Victor R., 197 Kurako, Mikhail, 161 Kushida, Jun-ichi, 583 Kuvayskova, Yuliya E., 197
L Lamperti, Gianfranco, 505 Le Corronc, Euriell, 545 Lebedev, Georgy S., 343 Legalov, Alexander, 355 Lien, Hui-Pain, 285 Lin, Jerry Chao-Lee, 285 Losev, Alexey, 343 Lyandau, Yury, 639, 659
M Manita, Ghaith, 37 Mare, Codru¸ta, 71 Maslennikov, Valery, 639 Maulana, Hanhan, 309 Mauve, Martin, 3 Mieczynska, Marta, 25 Mironov, Vasiliy, 161 Miyazawa, Kazuya, 379 Mizuno, Takafumi, 617 Murayama, Kazuaki, 491
N Nenashev, Vadim, 83 Ninomiya, Yoshiyuki, 441 Nitta, Kazuki, 415 Norikumo, Shunei, 649
O Oda, Ryoya, 391, 429 Ohishi, Mineaki, 457, 479 Ohya, Takao, 609 Okamura, Kensuke, 457 Ouertani, Mohamed Wajdi, 37 Ouni, Achref, 221 Oyamaguchi, Natsumi, 627
P Paya, Claire, 545 Pencolé, Yannick, 545 Pesnya, Evgeniy, 151 Popova, Elena, 659
Author Index Proskurin, Alexandr V., 185 Pucel, Xavier, 531
R Radomska-Zalas, Aleksandra, 297 Ramanna, Sheela, 141 Rojas, Norma, 59 Roussel, Stéphanie, 531 Royer, Eric, 221
S Sadykova, Ksenia, 659 Saleh, Hadi, 355 Sato-Ilic, Mika, 379, 403, 415 Savachenko, Anton, 333 Semenov, Aleksei, 659 Sentsov, Anton, 83 Sergeev, Alexander, 95 Sergeev, Mikhail, 83 Shekhovtsov, Andrii, 265, 321 Shimamura, Kaito, 491 Simonov, Konstantin, 161, 173 Subbotin, Alexey U., 197 Sugasawa, Shonosuke, 469 Szyman, Pawel, 15
T Takahama, Tetsuyuki, 583 Tamura, Keiichi, 569 Teraoka, Jun, 569 Toko, Yukako, 403 Travé-Massuyès, Louise, 531 Tzeng, Gwo-Hshiung, 285
V Vialletelle, Philippe, 545 von Lücken, Christian, 59 Vostrikov, Anton, 95
W W˛atróbski, Jarosław, 321, 265, 275 Wi˛eckowski, Jakub, 265, 275, 321 Wotawa, Franz, 519 Wu, Shengyi, 491 Wu, Zheng, 285
Y Yamamura, Mariko, 479
Author Index Yanagihara, Hirokazu, 391, 429, 457, 479 Yang, Pei-Feng, 285 Yasuda, Ryohei, 583 Yoshida, Hisao, 441 Yoshikawa, Kohei, 491
671 Z Zanella, Marina, 505 Zhao, Xiangfu, 505 Zotin, Aleksandr, 161, 173 Zykov, Sergey, 333, 343, 355, 365